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Preface 


The response of students and teachers to the first four editions of Linear Algebra and Its 
Applications has been most gratifying. This Fifth Edition provides substantial support 
both for teaching and for using technology in the course. As before, the text provides 
a modern elementary introduction to linear algebra and a broad selection of interest¬ 
ing applications. The material is accessible to students with the maturity that should 
come from successful completion of two semesters of college-level mathematics, usu¬ 
ally calculus. 

The main goal of the text is to help students master the basic concepts and skills they 
will use later in their careers. The topics here follow the recommendations of the Linear 
Algebra Curriculum Study Group, which were based on a careful investigation of the 
real needs of the students and a consensus among professionals in many disciplines that 
use linear algebra. We hope this course will be one of the most useful and interesting 
mathematics classes taken by undergraduates. 


WHAT'S NEW IN THIS EDITION 


The main goals of this revision were to update the exercises, take advantage of improve¬ 
ments in technology, and provide more support for conceptual learning. 

1. Support for the Fifth Edition is offered through MyMathLab. MyMathLab, from 
Pearson, is the world’s leading online resource in mathematics, integrating interac¬ 
tive homework, assessment, and media in a flexible, easy-to-use format. Students 
submit homework online for instantaneous feedback, support, and assessment. This 
system works particularly well for computation-based skills. Many additional re¬ 
sources are also provided through the MyMathLab web site. 

2. The Fifth Edition of the text is available in an interactive electronic format. Using 
the CDF player, a free Mathematica player available from Wolfram, students can 
interact with figures and experiment with matrices by looking at numerous examples 
with just the click of a button. The geometry of linear algebra comes alive through 
these interactive figures. Students are encouraged to develop conjectures through 
experimentation and then verify that their observations are correct by examining the 
relevant theorems and their proofs. The resources in the interactive version of the 
text give students the opportunity to play with mathematical objects and ideas much 
as we do with our own research. Files for Wolfram CDF Player are also available for 
classroom presentations. 

3. The Fifth Edition includes additional support for concept- and proof-based learning. 
Conceptual Practice Problems and their solutions have been added so that most sec¬ 
tions now have a proof- or concept-based example for students to review. Additional 
guidance has also been added to some of the proofs of theorems in the body of the 
textbook. 


™ ™ ™ 
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4. More than 25 percent of the exercises are new or updated, especially the computa¬ 
tional exercises. The exercise sets remain one of the most important features of this 
book, and these new exercises follow the same high standard of the exercise sets from 
the past four editions. They are crafted in a way that reflects the substance of each 
of the sections they follow, developing the students’ confidence while challenging 
them to practice and generalize the new ideas they have encountered. 


DISTINCTIVE FEATURES 


Early Introduction of Key Concepts 

Many fundamental ideas of linear algebra are introduced within the first seven lectures, 
in the concrete setting of M”, and then gradually examined from different points of view. 
Later generalizations of these concepts appear as natural extensions of familiar ideas, 
visualized through the geometric intuition developed in Chapter 1. A major achievement 
of this text is that the level of difficulty is fairly even throughout the course. 

A Modern View of Matrix Multiplication 

Good notation is crucial, and the text reflects the way scientists and engineers actually 
use linear algebra in practice. The definitions and proofs focus on the columns of a ma¬ 
trix rather than on the matrix entries. A central theme is to view a matrix-vector product 
Ax as a linear combination of the columns of A . This modern approach simplifies many 
arguments, and it ties vector space ideas into the study of linear systems. 


Linear Transformations 

Linear transformations form a “thread” that is woven into the fabric of the text. Their 
use enhances the geometric flavor of the text. In Chapter 1, for instance, linear transfor¬ 
mations provide a dynamic and graphical view of matrix-vector multiplication. 

Eigenvalues and Dynamical Systems 

Eigenvalues appear fairly early in the text, in Chapters 5 and 7. Because this material 
is spread over several weeks, students have more time than usual to absorb and review 
these critical concepts. Eigenvalues are motivated by and applied to discrete and con¬ 
tinuous dynamical systems, which appear in Sections 1.10, 4.8, and 4.9, and in five 
sections of Chapter 5. Some courses reach Chapter 5 after about five weeks by covering 
Sections 2.8 and 2.9 instead of Chapter 4. These two optional sections present all the 
vector space concepts from Chapter 4 needed for Chapter 5. 

Orthogonality and Least-Squares Problems 

These topics receive a more comprehensive treatment than is commonly found in begin¬ 
ning texts. The Linear Algebra Curriculum Study Group has emphasized the need for 
a substantial unit on orthogonality and least-squares problems, because orthogonality 
plays such an important role in computer calculations and numerical linear algebra and 
because inconsistent linear systems arise so often in practical work. 
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PEDAGOGICAL FEATURES 

Applications 

A broad selection of applications illustrates the power of linear algebra to explain fun¬ 
damental principles and simplify calculations in engineering, computer science, mathe¬ 
matics, physics, biology, economics, and statistics. Some applications appear in separate 
sections; others are treated in examples and exercises. In addition, each chapter opens 
with an introductory vignette that sets the stage for some application of linear algebra 
and provides a motivation for developing the mathematics that follows. Later, the text 
returns to that application in a section near the end of the chapter. 

A Strong Geometric Emphasis 

Every major concept in the course is given a geometric interpretation, because many 
students learn better when they can visualize an idea. There are substantially more 
drawings here than usual, and some of the figures have never before appeared in a linear 
algebra text. Interactive versions of these figures, and more, appear in the electronic 
version of the textbook. 

Examples 

This text devotes a larger proportion of its expository material to examples than do most 
linear algebra texts. There are more examples than an instructor would ordinarily present 
in class. But because the examples are written carefully, with lots of detail, students can 
read them on their own. 

Theorems and Proofs 

Important results are stated as theorems. Other useful facts are displayed in tinted boxes, 
for easy reference. Most of the theorems have formal proofs, written with the beginner 
student in mind. In a few cases, the essential calculations of a proof are exhibited in a 
carefully chosen example. Some routine verifications are saved for exercises, when they 
will benefit students. 

Practice Problems 

A few carefully selected Practice Problems appear just before each exercise set. Com¬ 
plete solutions follow the exercise set. These problems either focus on potential trouble 
spots in the exercise set or provide a “warm-up” for the exercises, and the solutions 
often contain helpful hints or warnings about the homework. 

Exercises 

The abundant supply of exercises ranges from routine computations to conceptual ques¬ 
tions that require more thought. A good number of innovative questions pinpoint con¬ 
ceptual difficulties that we have found on student papers over the years. Each exercise 
set is carefully arranged in the same general order as the text; homework assignments 
are readily available when only part of a section is discussed. A notable feature of the 
exercises is their numerical simplicity. Problems “unfold” quickly, so students spend 
little time on numerical calculations. The exercises concentrate on teaching understand¬ 
ing rather than mechanical calculations. The exercises in the Fifth Edition maintain the 
integrity of the exercises from previous editions, while providing fresh problems for 
students and instructors. 

Exercises marked with the symbol [M] are designed to be worked with the aid of a 
“Matrix program” (a computer program, such as MATLAB®, Maple™, Mathematical, 
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MathCad®, or Derive™, or a programmable calculator with matrix capabilities, such as 
those manufactured by Texas Instruments). 

True/False Questions 

To encourage students to read all of the text and to think critically, we have devel¬ 
oped 300 simple true/false questions that appear in 33 sections of the text, just after 
the computational problems. They can be answered directly from the text, and they 
prepare students for the conceptual problems that follow. Students appreciate these 
questions — after they get used to the importance of reading the text carefully. Based 
on class testing and discussions with students, we decided not to put the answers in the 
text. (The Study Guide tells the students where to find the answers to the odd-numbered 
questions.) An additional 150 true/false questions (mostly at the ends of chapters) test 
understanding of the material. The text does provide simple T/F answers to most of 
these questions, but it omits the justifications for the answers (which usually require 
some thought). 

Writing Exercises 

An ability to write coherent mathematical statements in English is essential for all stu¬ 
dents of linear algebra, not just those who may go to graduate school in mathematics. 
The text includes many exercises for which a written justification is part of the answer. 
Conceptual exercises that require a short proof usually contain hints that help a student 
get started. For all odd-numbered writing exercises, either a solution is included at the 
back of the text or a hint is provided and the solution is given in the Study Guide , 
described below. 

Computational T opics 

The text stresses the impact of the computer on both the development and practice of 
linear algebra in science and engineering. Frequent Numerical Notes draw attention 
to issues in computing and distinguish between theoretical concepts, such as matrix 
inversion, and computer implementations, such as LU factorizations. 


WEB SUPPORT 


MyMathLab-Online Homework and Resources 

Support for the Fifth Edition is offered through MyMathLab ( www.mymathlab.com ). 
MyMathLab from Pearson is the world’s leading online resource in mathematics, inte¬ 
grating interactive homework, assessment, and media in a flexible, easy-to-use format. 
MyMathLab contains hundreds of algorithmically generated exercises that mirror those 
in the textbook. Students submit homework online for instantaneous feedback, support, 
and assessment. This system works particularly well for supporting computation-based 
skills. Many additional resources are also provided through the MyMathLab web site. 

Interactive Textbook 

The Fifth Edition of the text is available in an interactive electronic format within 
MyMathLab. Using Wolfram CDF Player, a free Mathematica player available from 
Wolfram (www.wolfram.com/player) , students can interact with figures and experiment 
with matrices by looking at numerous examples. The geometry of linear algebra comes 
alive through these interactive figures. Students are encouraged to develop conjectures 





xii Preface 


through experimentation, then verify that their observations are correct by examining 
the relevant theorems and their proofs. The resources in the interactive version of the 
text give students the opportunity to interact with mathematical objects and ideas much 
as we do with our own research. 


This web site at www.pearsonhighered.com/lay contains all of the support material 
referenced below. These materials are also available within MyMathLab. 

Review Material 

Review sheets and practice exams (with solutions) cover the main topics in the text. 
They come directly from courses we have taught in the past years. Each review sheet 
identifies key definitions, theorems, and skills from a specified portion of the text. 

Applications by Chapters 

The web site contains seven Case Studies, which expand topics introduced at the begin¬ 
ning of each chapter, adding real-world data and opportunities for further exploration. In 
addition, more than 20 Application Projects either extend topics in the text or introduce 
new applications, such as cubic splines, airline flight routes, dominance matrices in 
sports competition, and error-correcting codes. Some mathematical applications are 
integration techniques, polynomial root location, conic sections, quadric surfaces, and 
extrema for functions of two variables. Numerical linear algebra topics, such as con¬ 
dition numbers, matrix factorizations, and the QR method for finding eigenvalues, are 
also included. Woven into each discussion are exercises that may involve large data sets 
(and thus require technology for their solution). 

Getting Started with Technology 

If your course includes some work with MATLAB, Maple, Mathematica, or TI calcula¬ 
tors, the Getting Started guides provide a “quick start guide” for students. 

Technology-specific projects are also available to introduce students to software 
and calculators. They are available on www.pearsonhighered.com/lay and within 
MyMathLab. Finally, the Study Guide provides introductory material for first-time 
technology users. 


Data Files 

Hundreds of files contain data for about 900 numerical exercises in the text, Case 
Studies, and Application Projects. The data are available in a variety of formats—for 
MATLAB, Maple, Mathematica, and the Texas Instruments graphing calculators. By 
allowing students to access matrices and vectors for a particular problem with only a few 
keystrokes, the data files eliminate data entry errors and save time on homework. These 
data files are available for download at www.pearsonhighered.com/lay and MyMathLab. 


Projects 


Exploratory projects for Mathematica, Maple, and MATLAB invite students to dis¬ 
cover basic mathematical and numerical issues in linear algebra. Written by experi¬ 
enced faculty members, these projects are referenced by the icon [web at appropriate 
points in the text. The projects explore fundamental concepts such as the column space, 
diagonalization, and orthogonal projections; several projects focus on numerical issues 
such as flops, iterative methods, and the SVD; and a few projects explore applications 
such as Lagrange interpolation and Markov chains. 
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SUPPLEMENTS 


Study Guide 


A printed version of the Study Guide is available at low cost. It is also available electron¬ 
ically within MyMathLab. The Guide is designed to be an integral part of the course. The 

in the text directs students to special subsections of the Guide that suggest how 



icon 

to master key concepts of the course. The Guide supplies a detailed solution to every 
third odd-numbered exercise, which allows students to check their work. A complete 
explanation is provided whenever an odd-numbered writing exercise has only a “Hint” 
in the answers. Frequent “Warnings” identify common errors and show how to prevent 
them. MATLAB boxes introduce commands as they are needed. Appendixes in the Study 
Guide provide comparable information about Maple, Mathematica, and TI graphing 
calculators (ISBN: 0-321-98257-6). 


Instructor’s Edition 

For the convenience of instructors, this special edition includes brief answers to all 
exercises. A Note to the Instructor at the beginning of the text provides a commentary 
on the design and organization of the text, to help instructors plan their courses. It also 
describes other support available for instructors (ISBN: 0-321-98261-4). 

Instructor’s Technology Manuals 

Each manual provides detailed guidance for integrating a specific software package or 
graphing calculator throughout the course, written by faculty who have already used 
the technology with this text. The following manuals are available to qualified instruc¬ 
tors through the Pearson Instructor Resource Center, www.pearsonhighered.com/irc and 
MyMathLab: MATLAB (ISBN: 0-321-98985-6), Maple (ISBN: 0-134-04726-5), 
Mathematica (ISBN: 0-321-98975-9), and TI-83+/89 (ISBN: 0-321-98984-8). 


Instructor’s Solutions Manual 

The Instructor’s Solutions Manual (ISBN 0-321-98259-2) contains detailed solutions 
for all exercises, along with teaching notes for many sections. The manual is available 
electronically for download in the Instructor Resource Center ( www.pearsonhighered. 
com/lay) and MyMathLab. 

PowerPoint® Slides and Other Teaching Tools 

A brisk pace at the beginning of the course helps to set the tone for the term. To get 
quickly through the first two sections in fewer than two lectures, consider using 
PowerPoint® slides (ISBN 0-321-98264-9). They permit you to focus on the process 
of row reduction rather than to write many numbers on the board. Students can receive 
a condensed version of the notes, with occasional blanks to fill in during the lecture. 
(Many students respond favorably to this gesture.) The PowerPoint slides are available 
for 25 core sections of the text. In addition, about 75 color figures from the text are 
available as PowerPoint slides. The PowerPoint slides are available for download at 
www.pearsonhighered.com/irc. Interactive figures are available as Wolfram CDF Player 
files for classroom demonstrations. These files provide the instructor with the oppor¬ 
tunity to bring the geometry alive and to encourage students to make conjectures by 
looking at numerous examples. The files are available exclusively within MyMathLab. 
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TestGen 

TestGen ( www.pearsonhighered.com/testgen ) enables instructors to build, edit, print, 
and administer tests using a computized bank of questions developed to cover all the 
objectives of the text. TestGen is algorithmically based, allowing instructors to create 
multiple, but equivalent, versions of the same question or test with the click of a but¬ 
ton. Instructors can also modify test bank questions or add new questions. The soft¬ 
ware and test bank are available for download from Pearson Education’s online catalog. 
(ISBN: 0-321-98260-6) 
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A Note to Students 


This course is potentially the most interesting and worthwhile undergraduate mathe¬ 
matics course you will complete. In fact, some students have written or spoken to us 
after graduation and said that they still use this text occasionally as a reference in their 
careers at major corporations and engineering graduate schools. The following remarks 
offer some practical advice and information to help you master the material and enjoy 
the course. 

In linear algebra, the concepts are as important as the computations . The simple 
numerical exercises that begin each exercise set only help you check your understanding 
of basic procedures. Later in your career, computers will do the calculations, but you 
will have to choose the calculations, know how to interpret the results, and then explain 
the results to other people. For this reason, many exercises in the text ask you to explain 
or justify your calculations. A written explanation is often required as part of the answer. 
For odd-numbered exercises, you will find either the desired explanation or at least a 
good hint. You must avoid the temptation to look at such answers before you have tried 
to write out the solution yourself. Otherwise, you are likely to think you understand 
something when in fact you do not. 

To master the concepts of linear algebra, you will have to read and reread the text 
carefully. New terms are in boldface type, sometimes enclosed in a definition box. A 
glossary of terms is included at the end of the text. Important facts are stated as theorems 
or are enclosed in tinted boxes, for easy reference. We encourage you to read the first 
five pages of the Preface to learn more about the structure of this text. This will give 
you a framework for understanding how the course may proceed. 

In a practical sense, linear algebra is a language. You must learn this language the 
same way you would a foreign language—with daily work. Material presented in one 
section is not easily understood unless you have thoroughly studied the text and worked 
the exercises for the preceding sections. Keeping up with the course will save you lots 
of time and distress! 

Numerical Notes 

We hope you read the Numerical Notes in the text, even if you are not using a computer 
or graphing calculator with the text. In real life, most applications of linear algebra 
involve numerical computations that are subject to some numerical error, even though 
that error may be extremely small. The Numerical Notes will warn you of potential 
difficulties in using linear algebra later in your career, and if you study the notes now, 
you are more likely to remember them later. 

If you enjoy reading the Numerical Notes, you may want to take a course later in 
numerical linear algebra. Because of the high demand for increased computing power, 
computer scientists and mathematicians work in numerical linear algebra to develop 
faster and more reliable algorithms for computations, and electrical engineers design 
faster and smaller computers to run the algorithms. This is an exciting field, and your 
first course in linear algebra will help you prepare for it. 
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A Note to Students 


Study Guide 


To help you succeed in this course, we suggest that you purchase the Study 
Guide (www.mypearsons tore .com; 0-321-98257-6). It is available electronically within 
MyMathLab. Not only will it help you learn linear algebra, it also will show you how to 
study mathematics. At strategic points in your textbook, the icon [ sg will direct you to 
special subsections in the Study Guide entitled “Mastering Linear Algebra Concepts.” 
There you will find suggestions for constructing effective review sheets of key concepts. 
The act of preparing the sheets is one of the secrets to success in the course, because 
you will construct links between ideas. These links are the “glue” that enables you to 
build a solid foundation for learning and remembering the main concepts in the course. 

The Study Guide contains a detailed solution to every third odd-numbered exercise, 
plus solutions to all odd-numbered writing exercises for which only a hint is given in the 
Answers section of this book. The Guide is separate from the text because you must learn 
to write solutions by yourself, without much help. (We know from years of experience 
that easy access to solutions in the back of the text slows the mathematical development 
of most students.) The Guide also provides warnings of common errors and helpful hints 
that call attention to key exercises and potential exam questions. 

If you have access to technology—MATLAB, Maple, Mathematica, or a TI graph¬ 
ing calculator—you can save many hours of homework time. The Study Guide is 
your “lab manual” that explains how to use each of these matrix utilities. It intro¬ 
duces new commands when they are needed. You can download from the web site 
www.pearsonhighered.com/lay the data for more than 850 exercises in the text. (With 
a few keystrokes, you can display any numerical homework problem on your screen.) 
Special matrix commands will perform the computations for you! 

What you do in your first few weeks of studying this course will set your pattern 
for the term and determine how well you finish the course. Please read “How to Study 
Linear Algebra” in the Study Guide as soon as possible. Many students have found the 
strategies there very helpful, and we hope you will, too. 







Linear Equations in 
Linear Algebra 


INTRODUCTORY EXAMPLE 

Linear Models in Economics 
and Engineering 




It was late summer in 1949. Harvard Professor Wassily 
Leontief was carefully feeding the last of his punched cards 
into the university’s Mark II computer. The cards contained 
information about the U.S. economy and represented a 
summary of more than 250,000 pieces of information 
produced by the U.S. Bureau of Labor Statistics after two 
years of intensive work. Leontief had divided the U.S. 
economy into 500 “sectors,” such as the coal industry, 
the automotive industry, communications, and so on. 
For each sector, he had written a linear equation that 
described how the sector distributed its output to the other 
sectors of the economy. Because the Mark II, one of the 
largest computers of its day, could not handle the resulting 
system of 500 equations in 500 unknowns, Leontief had 
distilled the problem into a system of 42 equations in 
42 unknowns. 

Programming the Mark II computer for Leontief’s 42 
equations had required several months of effort, and he 
was anxious to see how long the computer would take to 
solve the problem. The Mark II hummed and blinked for 56 
hours before finally producing a solution. We will discuss 
the nature of this solution in Sections 1.6 and 2.6. 


at Harvard in 1949 marked one of the first significant 
uses of computers to analyze what was then a large- 
scale mathematical model. Since that time, researchers 
in many other fields have employed computers to analyze 
mathematical models. Because of the massive amounts of 
data involved, the models are usually linear ; that is, they 
are described by systems of linear equations. 

The importance of linear algebra for applications has 
risen in direct proportion to the increase in computing 
power, with each new generation of hardware and 
software triggering a demand for even greater capabilities. 
Computer science is thus intricately linked with linear 
algebra through the explosive growth of parallel processing 
and large-scale computations. 

Scientists and engineers now work on problems far 
more complex than even dreamed possible a few decades 
ago. Today, linear algebra has more potential value for 
students in many scientific and business fields than any 
other undergraduate mathematics subject! The material in 
this text provides the foundation for further work in many 
interesting areas. Here are a few possibilities; others will 
be described later. 


Leontief, who was awarded the 1973 Nobel Prize 
in Economic Science, opened the door to a new era 
in mathematical modeling in economics. His efforts 


• Oil exploration. When a ship searches for offshore 
oil deposits, its computers solve thousands of 
separate systems of linear equations every day. 
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The seismic data for the equations are obtained 
from underwater shock waves created by explosions 
from air guns. The waves bounce off subsurface 
rocks and are measured by geophones attached to 
mile-long cables behind the ship. 

Linear programming . Many important management 
decisions today are made on the basis of linear 
programming models that use hundreds of variables. 
The airline industry, for instance, employs linear 


programs that schedule flight crews, monitor the 
locations of aircraft, or plan the varied schedules of 
support services such as maintenance and terminal 
operations. 

Electrical networks. Engineers use simulation 
software to design electrical circuits and microchips 
involving millions of transistors. Such software 
relies on linear algebra techniques and systems of 
linear equations. 

WEB 


Systems of linear equations lie at the heart of linear algebra, and this chapter uses them 
to introduce some of the central concepts of linear algebra in a simple and concrete 
setting. Sections 1.1 and 1.2 present a systematic method for solving systems of linear 
equations. This algorithm will be used for computations throughout the text. Sections 1.3 
and 1.4 show how a system of linear equations is equivalent to a vector equation and to a 
matrix equation. This equivalence will reduce problems involving linear combinations 
of vectors to questions about systems of linear equations. The fundamental concepts of 
spanning, linear independence, and linear transformations, studied in the second half of 
the chapter, will play an essential role throughout the text as we explore the beauty and 
power of linear algebra. 


1.1 SYSTEMS OF LINEAR EQUATIONS 


A linear equation in the variables x\,... ,x n is an equation that can be written in the 
form 

a\X\ + a^x^ + • • • + a n x n = b ( 1 ) 

where b and the coefficients a \,..., a n are real or complex numbers, usually known 
in advance. The subscript n may be any positive integer. In textbook examples and 
exercises, n is normally between 2 and 5. In real-life problems, n might be 50 or 5000, 
or even larger. 

The equations 

4xi — 5 x 2 + 2 = x\ and X 2 = 2 ( V 6 — xi) + X 3 
are both linear because they can be rearranged algebraically as in equation ( 1 ): 

3xi — 5 x 2 = —2 and 2 xi + X 2 — X 3 = 2 x /6 

The equations 

4xi — 5 x 2 = X 1 X 2 and X 2 = 2 +Jx[ — 6 

are not linear because of the presence of x 1 X 2 in the first equation and *Jxl in the second. 

A system of linear equations (or a linear system) is a collection of one or more 
linear equations involving the same variables — say, x\,... ,x n . An example is 


2xi — X 2 + 1.5x3 = 8 

x\ — 4 x 3 = 


(2) 
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A solution of the system is a list (s \, S 2 ,..., s n ) of numbers that makes each equation a 
true statement when the values S\, ... ,s n are substituted for x\ ,..., x n , respectively. For 
instance, (5, 6.5, 3) is a solution of system (2) because, when these values are substituted 
in (2) for X\,X 2 , X 3 , respectively, the equations simplify to 8 = 8 and —7 = —7. 

The set of all possible solutions is called the solution set of the linear system. Two 
linear systems are called equivalent if they have the same solution set. That is, each 
solution of the first system is a solution of the second system, and each solution of the 
second system is a solution of the first. 

Finding the solution set of a system of two linear equations in two variables is easy 
because it amounts to finding the intersection of two lines. A typical problem is 

X\ — 2x2 = ~ 1 

—X\ + 3x2 = 3 

The graphs of these equations are lines, which we denote by and I 2 • A pair of numbers 
(xi, X 2 ) satisfies both equations in the system if and only if the point (xi, X 2 ) lies on both 
l\ and I 2 . In the system above, the solution is the single point (3, 2), as you can easily 
verify. See Figure 1. 




FIGURE 1 Exactly one solution. 



Of course, two lines need not intersect in a single point—they could be parallel, or 
they could coincide and hence “intersect” at every point on the line. Figure 2 shows the 
graphs that correspond to the following systems: 

(a) x\ — 2x2 = — 1 (b) X\ — 2x2 = — 1 

—x\ + 2x2 = 3 — x\ + 2x2 = 1 



FIGURE 2 (a) No solution, (b) Infinitely many solutions. 


Figures 1 and 2 illustrate the following general fact about linear systems, to be 
verified in Section 1.2. 
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A system of linear equations has 

1 . no solution, or 

2 . exactly one solution, or 

3. infinitely many solutions. 


A system of linear equations is said to be consistent if it has either one solution or 
infinitely many solutions; a system is inconsistent if it has no solution. 


Matrix Notation 


The essential information of a linear system can be recorded compactly in a rectangular 
array called a matrix. Given the system 


X\ — 2X2 + *3 


2X2 — 8 x 3 


0 

8 


(3) 


5x 


1 


5x3 = 10 


with the coefficients of each variable aligned in columns, the matrix 


1 -2 


0 

5 


2 

0 


1 

8 

5 


is called the coefficient matrix (or matrix of coefficients) of the system (3), and 


1 

0 


2 

2 


5 0 


1 0 

8 8 

5 10 


(4) 


is called the augmented matrix of the system. (The second row here contains a zero 
because the second equation could be written as 0 • x\ + 2 x 2 — 8 x 3 = 8 .) An augmented 
matrix of a system consists of the coefficient matrix with an added column containing 
the constants from the right sides of the equations. 

The size of a matrix tells how many rows and columns it has. The augmented matrix 
(4) above has 3 rows and 4 columns and is called a 3 x 4 (read “3 by 4”) matrix. If m and 
n are positive integers, an m x n matrix is a rectangular array of numbers with m rows 
and n columns. (The number of rows always comes first.) Matrix notation will simplify 
the calculations in the examples that follow. 


Solving a Linear System 

This section and the next describe an algorithm, or a systematic procedure, for solving 
linear systems. The basic strategy is to replace one system with an equivalent system 
(i.e., one with the same solution set) that is easier to solve. 

Roughly speaking, use the X\ term in the first equation of a system to eliminate the 
X\ terms in the other equations. Then use the X 2 term in the second equation to eliminate 
the X 2 terms in the other equations, and so on, until you finally obtain a very simple 
equivalent system of equations. 

Three basic operations are used to simplify a linear system: Replace one equation 
by the sum of itself and a multiple of another equation, interchange two equations, and 
multiply all the terms in an equation by a nonzero constant. After the first example, you 
will see why these three operations do not change the solution set of the system. 
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EXAMPLE 1 Solve system (3). 

SOLUTION The elimination procedure is shown here with and without matrix notation, 
and the results are placed side by side for comparison: 


x\ — 2 x 2 + x 3 = 0 
2 x 2 — 8 x 3 = 8 
5xi — 5 x 3 = 10 


1-2 1 0 

0 2-8 8 

5 0 -5 10 


Keep x\ in the first equation and eliminate it from the other equations. To do so, add —5 
times equation 1 to equation 3. After some practice, this type of calculation is usually 
performed mentally: 


—5 • [equation 1] 
+ [equation 3] 
[new equation 3] 


5xi + 10x2 


5x 3 


0 


5x 


1 


5 x 3 = 10 


10X2 — IOX 3 = 10 


The result of this calculation is written in place of the original third equation: 


x\ — 2 x 2 + X 3 = 0 

2 x 2 — 8 x 3 = 8 
10X2 — 10X3 = 10 


1-210 
0 2-88 
0 10 -10 10 


Now, multiply equation 2 by ^ in order to obtain 1 as the coefficient for X 2 . (This 
calculation will simplify the arithmetic in the next step.) 


X\ 


2 X 2 + X 3 


x 2 


lOx 


4x 3 

10x^ 


0 

4 

10 


1 

0 


2 

1 


1 

4 


0 

4 


0 10 -10 10 


Use the X 2 in equation 2 to eliminate the IOX 2 in equation 3. The “mental” computation 


is 


TO- [equation 2] 
+ [equation 3] 
[new equation 3] 


10x2 + 40x3 


lOx 


IOX 3 


40 

10 


30x 


30 


The result of this calculation is written in place of the previous third equation (row): 


x\ — 2 x 2 + X 3 = 0 

X 2 — 4X3 = 4 

30x3 = — 30 


1-2 1 0 
0 1-44 

0 0 30 -30 


Now, multiply equation 3 by in order to obtain 1 as the coefficient for X3. (This 
calculation will simplify the arithmetic in the next step.) 


x\ — 2 x 2 + X 3 = 0 

X 2 — 4x3 = 4 

x 3 = -1 


1 

0 

0 




The new system has a triangular form (the intuitive term triangular will be replaced by 
a precise term in the next section): 


x\ — 2 x 2 + X 3 = 0 

X2 — 4x 3 — 4 

x 3 = -1 


1 

0 

0 
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Eventually, you want to eliminate the —2x2 term from equation 1, but it is more efficient 
to use the X 3 in equation 3 first, to eliminate the — 4 x 3 an d +X 3 terms in equations 2 and 1. 
The two “mental” calculations are 



Each of the original equations 
determines a plane in 
three-dimensional space. The 
point (1,0, —1) lies in all three 
planes. 


4 • [equation 3] 
+ [equation 2] 
[new equation 2] 


4x 3 = -4 
X2 — 4X3 = 4 

x? = 0 


— 1 • [equation 3] 
+ [equation 1] 
[new equation 1 ] 


- x 3 = 1 
xi — 2x2 + x 3 = 0 
X\ — 2X2 = 1 


It is convenient to combine the results of these two operations: 



1-201 
0 10 0 

0 0 1-1 


Now, having cleaned out the column above the x 3 in equation 3, move back to the X 2 in 
equation 2 and use it to eliminate the —2x2 above it. Because of the previous work with 
x 3 , there is now no arithmetic involving x 3 terms. Add 2 times equation 2 to equation 1 
and obtain the system: 



10 0 1 
0 10 0 

0 0 1-1 


The work is essentially done. It shows that the only solution of the original system is 
(1,0, —1). However, since there are so many calculations involved, it is a good practice 
to check the work. To verify that (1,0, —1) is a solution, substitute these values into the 
left side of the original system, and compute: 


1(1) - 

-2(0) 

+ 1(- 

-1) = 

= 1 -0 - 1 = 

= 0 


2(0) 

— 8(- 

-1) = 

= 0 + 8 = 

= 8 

5(1) 


— 5(- 

-1) = 

= 5 + 5 = 

= 10 


The results agree with the right side of the original system, so (1,0,—1) is a solution of 
the system. ■ 


Example 1 illustrates how operations on equations in a linear system correspond to 
operations on the appropriate rows of the augmented matrix. The three basic operations 
listed earlier correspond to the following operations on the augmented matrix. 


ELEMENTARY ROW OPERATIONS 

1. (Replacement) Replace one row by the sum of itself and a multiple of another 
row. 1 

2. (Interchange) Interchange two rows. 

3. (Scaling) Multiply all entries in a row by a nonzero constant. 


Row operations can be applied to any matrix, not merely to one that arises as the 
augmented matrix of a linear system. Two matrices are called row equivalent if there 
is a sequence of elementary row operations that transforms one matrix into the other. 

It is important to note that row operations are reversible. If two rows are inter¬ 
changed, they can be returned to their original positions by another interchange. If a 


1 A common paraphrase of row replacement is “Add to one row a multiple of another row.” 
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row is scaled by a nonzero constant c , then multiplying the new row by 1 /c produces 
the original row. Finally, consider a replacement operation involving two rows —say, 
rows 1 and 2 —and suppose that c times row 1 is added to row 2 to produce a new row 
2. To “reverse” this operation, add — c times row 1 to (new) row 2 and obtain the original 
row 2. See Exercises 29-32 at the end of this section. 

At the moment, we are interested in row operations on the augmented matrix of a 
system of linear equations. Suppose a system is changed to a new one via row operations. 
By considering each type of row operation, you can see that any solution of the original 
system remains a solution of the new system. Conversely, since the original system can 
be produced via row operations on the new system, each solution of the new system is 
also a solution of the original system. This discussion justifies the following statement. 


If the augmented matrices of two linear systems are row equivalent, then the two 
systems have the same solution set. 


Though Example 1 is lengthy, you will find that after some practice, the calculations 
go quickly. Row operations in the text and exercises will usually be extremely easy to 
perform, allowing you to focus on the underlying concepts. Still, you must learn to 
perform row operations accurately because they will be used throughout the text. 

The rest of this section shows how to use row operations to determine the size of a 
solution set, without completely solving the linear system. 

Existence and Uniqueness Questions 

Section 1.2 will show why a solution set for a linear system contains either no solutions, 
one solution, or infinitely many solutions. Answers to the following two questions will 
determine the nature of the solution set for a linear system. 

To determine which possibility is true for a particular system, we ask two questions. 


TWO FUNDAMENTAL QUESTIONS ABOUT A LINEAR SYSTEM 

1. Is the system consistent; that is, does at least one solution exist! 

2. If a solution exists, is it the only one; that is, is the solution uniquel 


These two questions will appear throughout the text, in many different guises. This 
section and the next will show how to answer these questions via row operations on 
the augmented matrix. 


EXAMPLE 2 Determine if the following system is consistent: 


X\ — 2X2 + X 3 


2X2 — 8 x 3 


0 

8 


5x 


1 


5x3 = 10 


SOLUTION This is the system from Example 1. Suppose that we have performed the 
row operations necessary to obtain the triangular form 


Xi — 2X2 + X 3 

X2 — 4X3 


X 3 


0 

4 

1 


1 -2 1 


0 


0 1-44 

0 0 1-1 






8 CHAPTER 1 Linear Equations in Linear Algebra 



The system is inconsistent because 
there is no point that lies on all 
three planes. 


At this point, we know X 3 . Were we to substitute the value of X 3 into equation 2, we 
could compute x 2 and hence could determine X\ from equation 1. So a solution exists; 
the system is consistent. (In fact, X 2 is uniquely determined by equation 2 since X 3 has 
only one possible value, and X\ is therefore uniquely determined by equation 1. So the 
solution is unique.) ■ 

EXAMPLE 3 Determine if the following system is consistent: 


%2 — 4X3 = 8 

2xi — 3x2 + 2 x 3 = 1 (5) 

4 xi — 8x2 + 12x3 = 1 
SOLUTION The augmented matrix is 

"0 1 -4 8 " 

2-3 2 1 

_4 -8 12 1_ 

To obtain an x\ in the first equation, interchange rows 1 and 2: 

"2-3 2 1" 

0 1-48 

4-8 12 1 


To eliminate the 4xi term in the third equation, add —2 times row 1 to row 3: 


2 

0 

0 





Next, use the X 2 term in the second equation to eliminate the — 2 x 2 term from the third 
equation. Add 2 times row 2 to row 3: 


2-3 2 1 

0 1-4 8 

0 0 0 15 



The augmented matrix is now in triangular form. To interpret it correctly, go back to 
equation notation: 

2xi — 3 x 2 + 2x3 = 1 

X 2 — 4x3 = 8 (8) 

0 = 15 

The equation 0 = 15 is a short form of Oxi + OX 2 + OX 3 = 15. This system in trian¬ 
gular form obviously has a built-in contradiction. There are no values of X\ , X 2 , X 3 that 
satisfy ( 8 ) because the equation 0 = 15 is never true. Since ( 8 ) and (5) have the same 
solution set, the original system is inconsistent (i.e., has no solution). ■ 

Pay close attention to the augmented matrix in (7). Its last row is typical of an 
inconsistent system in triangular form. 
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NUMERICAL NOTE - 

In real-world problems, systems of linear equations are solved by a computer. 
For a square coefficient matrix, computer programs nearly always use the elim¬ 
ination algorithm given here and in Section 1.2, modified slightly for improved 
accuracy. 

The vast majority of linear algebra problems in business and industry are 
solved with programs that us q floating point arithmetic. Numbers are represented 
as decimals ±.d\---d p x 10 7 , where r is an integer and the number p of digits to 
the right of the decimal point is usually between 8 and 16. Arithmetic with such 
numbers typically is inexact, because the result must be rounded (or truncated) 
to the number of digits stored. “Roundoff error” is also introduced when a 
number such as 1/3 is entered into the computer, since its decimal representation 
must be approximated by a finite number of digits. Fortunately, inaccuracies in 
floating point arithmetic seldom cause problems. The numerical notes in this 
book will occasionally warn of issues that you may need to consider later in your 
career. 


PRACTICE PROBLEMS 


Throughout the text, practice problems should be attempted before working the exer¬ 
cises. Solutions appear after each exercise set. 

1. State in words the next elementary row operation that should be performed on the 
system in order to solve it. [More than one answer is possible in (a).] 


a. x\ + 4 x 2 — 2 x 3 + 8 x 4 = 12 


X 2 — 7X3 + 2X4 


5 x 3 


X 4 


X3 + 3X4 


4 
7 

5 


b. x\ — 3 x 2 + 5 x 3 — 2 x 4 

X2 + 8X3 


2x 


X4 


0 

4 

3 

1 


2 . The augmented matrix of a linear system has been transformed by row operations 
into the form below. Determine if the system is consistent. 

'1 5 2-6' 

0 4-72 

0 0 5 0 


3 . Is (3, 4, —2) a solution of the following system? 


5 x 


1 


x 2 + 2 x 3 


2xi + 6x2 + 9 x 3 
lx\ + 5 X 2 — 3 X 3 


7 

0 

7 


4 . For what values of h and k is the following system consistent? 


2x 


1 


X 2 — h 


6 x 1 + 3x2 = k 
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1.1 EXERCISES 


Solve each system in Exercises 1-4 by using elementary row 
operations on the equations or on the augmented matrix. Follow 
the systematic elimination procedure described in this section. 


1. X{ + 5x 2 = 7 2. 2x\ + 4x 2 = —4 

—2xi — lx 2 = —5 5xi + lx 2 = 11 

3. Find the point (xi, x 2 ) that lies on the line x\ + 5x 2 = 1 and 
on the line x\ — 2x 2 = —2. See the figure. 



4. Find the point of intersection of the lines x\ — 5x 2 = 1 and 
3x\ — lx 2 — 5. 

Consider each matrix in Exercises 5 and 6 as the augmented matrix 
of a linear system. State in words the next two elementary row 
operations that should be performed in the process of solving the 
system. 

" 1 -4 5 0 7" 

0 1-306 

5 ' 0 0 1 0 2 

_0 0 0 1 -5 _ 

“1-6 4 0-1" 

0 2 -7 0 4 

6 ' 0 0 1 2-3 

0 0 3 1 6 


In Exercises 7-10, the augmented matrix of a linear system has 
been reduced by row operations to the form shown. In each case, 
continue the appropriate row operations and describe the solution 
set of the original system. 




Solve the systems in Exercises 11-14. 


11. x 2 + 4x 3 = —5 


X\ + 3^2 T 5x 3 — —2 


3xi + 1 x 2 7x 3 — 6 


12. Xi — 3x 2 + 4x 3 = —4 
3xi — 7 x 2 + 7x 3 = —8 
— 4xi + 6x 2 — x 3 = 7 


13. X\ — 3x 3 = 8 

2xi + 2 x 2 + 9x 3 — 7 

x 2 + 5 x 3 — —2 

14. Xi — 3x 2 = 5 

— X \ + x 2 -j- 5x 3 — 2 

x 2 + x 3 — 0 


Determine if the systems in Exercises 15 and 16 are consistent. 

Do not completely solve the systems. 

15. Xi + 3 x 3 = 2 

x 2 — 3 x 4 = 3 

— 2x 2 + 3x 3 + 2x 4 — 1 

3xi + 7 x 4 = —5 

16. x\ — 2x 4 = —3 

2x 2 + 2x 3 — 0 

x 3 -j- 3x 4 — 1 

—2xi + 3 x 2 2x 3 x 4 — 5 

17. Do the three lines x\ — 4x 2 = 1, 2xi — x 2 = —3, and 
— x\ — 3x 2 = 4 have a common point of intersection? 
Explain. 

18. Do the three planes X\ + 2x 2 + x 3 = 4, x 2 — x 3 = 1, and 
X\ -j- 3x 2 = 0 have at least one common point of intersec¬ 
tion? Explain. 


In Exercises 19-22, determine the value(s) of h such that the 
matrix is the augmented matrix of a consistent linear system. 



In Exercises 23 and 24, key statements from this section are 
either quoted directly, restated slightly (but still true), or altered 
in some way that makes them false in some cases. Mark each 
statement True or False, and justify your answer. (If true, give the 
approximate location where a similar statement appears, or refer 
to a definition or theorem. If false, give the location of a statement 
that has been quoted or used incorrectly, or cite an example that 
shows the statement is not true in all cases.) Similar true/false 
questions will appear in many sections of the text. 
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23. a. Every elementary row operation is reversible. 

b. A 5 x 6 matrix has six rows. 

c. The solution set of a linear system involving variables 
X \,..., x n is a list of numbers (s \,..., s n ) that makes each 
equation in the system a true statement when the values 
s i,..., s n are substituted for x \,..., x n , respectively. 

d. Two fundamental questions about a linear system involve 
existence and uniqueness. 

24. a. Elementary row operations on an augmented matrix never 

change the solution set of the associated linear system. 

b. Two matrices are row equivalent if they have the same 
number of rows. 

c. An inconsistent system has more than one solution. 

d. Two linear systems are equivalent if they have the same 
solution set. 

25. Find an equation involving g, h, and k that makes this 
augmented matrix correspond to a consistent system: 

"1-4 7 g~ 

0 3—5 /z 

_ —2 5-9 k_ 

26. Construct three different augmented matrices for linear sys¬ 
tems whose solution set is x\ — —2, x 2 = 1, v 3 = 0. 

27. Suppose the system below is consistent for all possible values 
of / and g . What can you say about the coefficients c and d ? 
Justify your answer. 

*i + 3 x 2 = f 

cx 1 + dx 2 = g 

28. Suppose a, b, c, and d are constants such that a is not zero 
and the system below is consistent for all possible values of 
/ and g . What can you say about the numbers a,b,c , and d ? 
Justify your answer. 

ax i + bx 2 = f 
cx i + dx 2 — g 

In Exercises 29-32, find the elementary row operation that trans¬ 
forms the first matrix into the second, and then find the reverse 
row operation that transforms the second matrix into the first. 



1 

0 

-2 

_1 


" 1 

4 

-7 



29. 

1 

4 

-7 

5 

0 

-2 

5 




_ 3 

-1 

1_ 


_ 3 

-1 

1 




" 1 

3 

-4 


" 1 

3 

-4 



30. 

0 

-2 

6 

5 

0 

1 

-3 




1 

0 

-5 

^0 

1_ 


1 

0 

-5 

^0 

1_ 




1 

-2 

1 

1 

0 


" 1 

-2 

1 

0 

31. 

0 

5 

-2 

OO 


0 

5 

-2 

oc 


_1 

-1 

3 

-6_ 


1 

0 

7 

-1 

-6 


1 

2 

-5 

1 

0 


1 

2 

-5 

0 

32. 

0 

1 

-3 

/ 

2 

5 

0 

1 

-3 

-2 


1 

0 

-3 

9 

Lh 

1 


1 

0 

0 

0 

-1 


An important concern in the study of heat transfer is to determine 
the steady-state temperature distribution of a thin plate when the 
temperature around the boundary is known. Assume the plate 
shown in the figure represents a cross section of a metal beam, 
with negligible heat flow in the direction perpendicular to the 
plate. Let 7i,..., T 4 denote the temperatures at the four interior 
nodes of the mesh in the figure. The temperature at a node is 
approximately equal to the average of the four nearest nodes — 
to the left, above, to the right, and below. 2 For instance, 


T { = (10 + 20 + T 2 + r 4 )/4, or 47^-73-7; = 30 



33. Write a system of four equations whose solution gives esti¬ 
mates for the temperatures T\,... ,T 4 . 

34. Solve the system of equations from Exercise 33. [Hint: To 
speed up the calculations, interchange rows 1 and 4 before 
starting “replace” operations.] 


2 See Frank M. White, Heat and Mass Transfer (Reading, MA: 
Addison-Wesley Publishing, 1991), pp. 145-149. 


SOLUTIONS TO PRACTICE PROBLEMS 

1. a. For “hand computation,” the best choice is to interchange equations 3 and 4. 
Another possibility is to multiply equation 3 by 1/5. Or, replace equation 4 by 
its sum with —1/5 times row 3. (In any case, do not use the x 2 in equation 2 to 
eliminate the 4 x 2 in equation 1. Wait until a triangular form has been reached and 
the X 3 terms and X 4 terms have been eliminated from the first two equations.) 

b. The system is in triangular form. Further simplification begins with the X 4 in the 
fourth equation. Use the X 4 to eliminate all X 4 terms above it. The appropriate 
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step now is to add 2 times equation 4 to equation 1. (After that, move to equa¬ 
tion 3, multiply it by 1 / 2 , and then use the equation to eliminate the X 3 terms 
above it.) 

2 . The system corresponding to the augmented matrix is 



Since (3,4, —2) satisfies the first 
two equations, it is on the line of 
the intersection of the first two 
planes. Since (3,4, —2) does not 
satisfy all three equations, it does 
not lie on all three planes. 


X\ + 5x2 + 2 x 3 







6 

2 

0 


The third equation makes X 3 = 0, which is certainly an allowable value for X 3 . After 
eliminating the X 3 terms in equations 1 and 2 , you could go on to solve for unique 
values for X 2 and X\ . Hence a solution exists, and it is unique. Contrast this situation 
with that in Example 3. 

3. It is easy to check if a specific list of numbers is a solution. Set X\ = 3, X 2 — 4, and 
X 3 = — 2 , and find that 


5 ( 3 ) _ ( 4 ) + 2(—2) = 15 — 4 — 4 = 7 

-2(3) + 6(4) + 9(—2) = —6 + 24 — 18 = 0 

-7(3) + 5(4) - 3(—2) = -21 + 20 + 6 = 5 

Although the first two equations are satisfied, the third is not, so (3,4, —2) is not a 
solution of the system. Notice the use of parentheses when making the substitutions. 
They are strongly recommended as a guard against arithmetic errors. 

4 . When the second equation is replaced by its sum with 3 times the first equation, the 
system becomes 


2x\ — X 2 = h 

0 = k + 3h 

If k + 3h is nonzero, the system has no solution. The system is consistent for any 
values of h and k that make k + 3h = 0. 


1.2 ROW REDUCTION AND ECHELON FORMS 


This section refines the method of Section 1.1 into a row reduction algorithm that will 
enable us to analyze any system of linear equations. 1 By using only the first part of 
the algorithm, we will be able to answer the fundamental existence and uniqueness 
questions posed in Section 1.1. 

The algorithm applies to any matrix, whether or not the matrix is viewed as an 
augmented matrix for a linear system. So the first part of this section concerns an arbi¬ 
trary rectangular matrix and begins by introducing two important classes of matrices that 
include the “triangular” matrices of Section 1.1. In the definitions that follow, a nonzero 
row or column in a matrix means a row or column that contains at least one nonzero 
entry; a leading entry of a row refers to the leftmost nonzero entry (in a nonzero row). 


1 The algorithm here is a variant of what is commonly called Gaussian elimination. A similar elimination 
method for linear systems was used by Chinese mathematicians in about 250 B.C. The process was unknown 
in Western culture until the nineteenth century, when a famous German mathematician, Carl Friedrich Gauss, 
discovered it. A German engineer, Wilhelm Jordan, popularized the algorithm in an 1888 text on geodesy. 
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DEFINITION 


A rectangular matrix is in echelon form (or row echelon form) if it has the 
following three properties: 

1. All nonzero rows are above any rows of all zeros. 

2. Each leading entry of a row is in a column to the right of the leading entry of 
the row above it. 

3. All entries in a column below a leading entry are zeros. 

If a matrix in echelon form satisfies the following additional conditions, then it is 

in reduced echelon form (or reduced row echelon form): 

4 . The leading entry in each nonzero row is 1. 

5. Each leading 1 is the only nonzero entry in its column. 


An echelon matrix (respectively, reduced echelon matrix) is one that is in echelon 
form (respectively, reduced echelon form). Property 2 says that the leading entries form 
an echelon (“steplike”) pattern that moves down and to the right through the matrix. 
Property 3 is a simple consequence of property 2, but we include it for emphasis. 

The “triangular” matrices of Section 1.1, such as 


"2 

-3 

2 

1 


"1 

0 

0 

29 

0 

1 

-4 

8 

and 

0 

1 

0 

16 

0 

0 

0 

5/2 _ 


0 

0 

1 

3 


are in echelon form. In fact, the second matrix is in reduced echelon form. Here are 
additional examples. 


EXAMPLE 1 The following matrices are in echelon form. The leading entries (■) 
may have any nonzero value; the starred entries (*) may have any value (including zero). 



0 

0 

0 




0 

0 



0 ■ * * * 

0 0 0 ■ * 

o o o o ■ 

0 0 0 0 0 

0 0 0 0 0 



* 

* 

* 

* 



* 

* 

* 

* 



* 

* 

* 

* 

* 


The following matrices are in reduced echelon form because the leading entries are 1 ’s, 
and there are 0’s below and above each leading 1. 


1 0 * 

0 1 * 

0 0 0 

0 0 0 



0 1*0 
0 0 0 1 

0 0 0 0 

0 0 0 0 

0 0 0 0 


0 0**0 

0 0**0 

10**0 
0 1**0 

0 0 0 0 1 


* 

* 

* 

* 

* 



Any nonzero matrix may be row reduced (that is, transformed by elementary row 
operations) into more than one matrix in echelon form, using different sequences of row 
operations. However, the reduced echelon form one obtains from a matrix is unique. The 
following theorem is proved in Appendix A at the end of the text. 


Uniqueness of the Reduced Echelon Form 

Each matrix is row equivalent to one and only one reduced echelon matrix. 


THEOREM 1 
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If a matrix A is row equivalent to an echelon matrix U , we call U an echelon form 
(or row echelon form) of A ; if U is in reduced echelon form, we call U the reduced 
echelon form of A . [Most matrix programs and calculators with matrix capabilities 
use the abbreviation RREF for reduced (row) echelon form. Some use REF for (row) 
echelon form.] 

Pivot Positions 

When row operations on a matrix produce an echelon form, further row operations to 
obtain the reduced echelon form do not change the positions of the leading entries. Since 
the reduced echelon form is unique, the leading entries are always in the same positions 
in any echelon form obtained from a given matrix. These leading entries correspond to 
leading l’s in the reduced echelon form. 


DEFINITION 


A pivot position in a matrix A is a location in A that corresponds to a leading 1 
in the reduced echelon form of A . A pivot column is a column of A that contains 
a pivot position. 


In Example 1, the squares (■) identify the pivot positions. Many fundamental con¬ 
cepts in the first four chapters will be connected in one way or another with pivot 
positions in a matrix. 

below to echelon form, and locate the pivot 

-6 4 9" 

-1 3 1 

0 3-1 

5 -9 -7_ 

SOLUTION Use the same basic strategy as in Section 1.1. The top of the leftmost 
nonzero column is the first pivot position. A nonzero entry, or pivot , must be placed 
in this position. A good choice is to interchange rows 1 and 4 (because the mental 
computations in the next step will not involve fractions). 

— Pivot 

4 5-9-7" 

-2-1 3 1 

-3 0 3 -1 

-3 -6 4 9 _ 

Pivot column 

Create zeros below the pivot, 1, by adding multiples of the first row to the rows below, 
and obtain matrix (1) below. The pivot position in the second row must be as far left as 
possible—namely, in the second column. Choose the 2 in this position as the next pivot. 


1 4 5 

0 2^4 

0 5 10 

0 -3 -6 

t 


*ivot 


-9 

-6 

15 

4 


-7 

-6 

15 

9 


0 ) 



EXAMPLE 2 Row reduce the matrix A 
columns of A . 

0 -3 

A= - 1 ~ 2 
A -2 -3 

1 4 


Next pivot column 
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Add —5/2 times row 2 to row 3, and add 3/2 times row 2 to row 4. 

"1 4 5-9-7" 

0 2 4 -6 -6 

0 0 0 0 0 

000-50 



The matrix in (2) is different from any encountered in Section 1.1. There is no way to 
create a leading entry in column 3! (We can’t use row 1 or 2 because doing so would 
destroy the echelon arrangement of the leading entries already produced.) However, if 
we interchange rows 3 and 4, we can produce a leading entry in column 4. 


1_i 


Pivot 


1 

4 

5 

-9 

-7 


■ 

* 

* 

* 

* 

0 

2 

4 

-6 

m 

-6 

General form: 

0 

■ 

* 

* 

* 

0 

0 

0 

—5+ 

0 

0 

0 

0 

■ 

* 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 


i 


Pivot columns 


The matrix is in echelon form and thus reveals that columns 1,2, and 4 of A are pivot 
columns. 









0+ 

-3 

-6 

4 

9 


A = 

-1 

-2J 

-1 

3 

1 


-2 

-3 

0 

3- 

-1 



1 

4 

5 

-9 

-7 



Pivot positions 




i 


Pivot columns 


A pivot, as illustrated in Example 2, is a nonzero number in a pivot position that is 
used as needed to create zeros via row operations. The pivots in Example 2 were 1,2, 
and —5. Notice that these numbers are not the same as the actual elements of A in the 
highlighted pivot positions shown in (3). 

With Example 2 as a guide, we are ready to describe an efficient procedure for 
transforming a matrix into an echelon or reduced echelon matrix. Careful study and 
mastery of this procedure now will pay rich dividends later in the course. 


The Row Reduction Algorithm 

The algorithm that follows consists of four steps, and it produces a matrix in echelon 
form. A fifth step produces a matrix in reduced echelon form. We illustrate the algorithm 
by an example. 

EXAMPLE 3 Apply elementary row operations to transform the following matrix 
first into echelon form and then into reduced echelon form: 

"0 3 -6 6 4 -5" 

3 -7 8 -5 8 9 

_3 -9 12-9 6 15 _ 

SOLUTION 


STEP 1 

Begin with the leftmost nonzero column. This is a pivot column. The pivot 
position is at the top. 
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0 3 -6 6 4 -5 

3 -7 8 -5 8 9 

3 -9 12 -9 6 15 

^— Pivot column 


STEP 2 

Select a nonzero entry in the pivot column as a pivot. If necessary, interchange 
rows to move this entry into the pivot position. 


Interchange rows 1 and 3. (We could have interchanged rows 1 and 2 instead.) 



Pivot 

12 

8 

-6 



6 

8 

4 



STEP 3 

Use row replacement operations to create zeros in all positions below the pivot. 


As a preliminary step, we could divide the top row by the pivot, 3. But with two 3’s in 
column 1, it is just as easy to add —1 times row 1 to row 2. 



Pivot 

12 

-4 

-6 



6 

2 

4 


15 

-6 

-5 


STEP 4 

Cover (or ignore) the row containing the pivot position and cover all rows, if any, 
above it. Apply steps 1-3 to the submatrix that remains. Repeat the process until 
there are no more nonzero rows to modify. 


With row 1 covered, step 1 shows that column 2 is the next pivot column; for step 2, 
select as a pivot the “top” entry in that column. 


3 

0 

0 


Pivot 


New pivot column 


For step 3, we could insert an optional step of dividing the “top” row of the submatrix by 
the pivot, 2. Instead, we add —3/2 times the “top” row to the row below. This produces 


9 

2 

3 

t 


12 -9 6 15 

-4 4 2 -6 

-6 6 4 -5 


3 -9 12 -9 6 15 

0 2 -4 4 2 -6 

0 0 0 0 1 4 
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When we cover the row containing the second pivot position for step 4, we are left with 
a new submatrix having only one row: 


3 

0 

0 




6 

2 

1 


15 

-6 

4 _ 

— Pivot 


Steps 1-3 require no work for this submatrix, and we have reached an echelon form of 
the full matrix. If we want the reduced echelon form, we perform one more step. 


STEP 5 

Beginning with the rightmost pivot and working upward and to the left, create 
zeros above each pivot. If a pivot is not 1, make it 1 by a scaling operation. 


The rightmost pivot is in row 3. Create zeros above it, adding suitable multiples of row 
3 to rows 2 and 1. 


3 -9 12 -9 0 

0 2 -4 4 0 

0 0 0 0 1 


-9 

14 

4 


Row 1 + (—6) • row 3 
Row 2 + (—2) • row 3 


The next pivot is in row 2. Scale this row, dividing by the pivot. 


3 -9 12 -9 0 -9 

0 1 -2 2 0 -7 

0 0 0 0 1 4 


Row scaled by \ 


Create a zero in column 2 by adding 9 times row 2 to row 1. 


3 0 -6 9 0 -72 


01-220 
0 0 0 0 1 


7 

4 


Row 1 + (9) • row 2 


Finally, scale row 1, dividing by the pivot, 3. 


1 0 -2 3 0 -24 


01-220 
0 0 0 0 1 


7 

4 


Row scaled by ~ 


This is the reduced echelon form of the original matrix. 


The combination of steps 1-4 is called the forward phase of the row reduction 
algorithm. Step 5, which produces the unique reduced echelon form, is called the back¬ 
ward phase. 


NUMERICAL NOTE - 

In step 2 above, a computer program usually selects as a pivot the entry in a 
column having the largest absolute value. This strategy, called partial pivoting, 
is used because it reduces roundoff errors in the calculations. 
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Solutions of Linear Systems 

The row reduction algorithm leads directly to an explicit description of the solution set 
of a linear system when the algorithm is applied to the augmented matrix of the system. 

Suppose, for example, that the augmented matrix of a linear system has been 
changed into the equivalent reduced echelon form 

"1 0-5 1" 

0 114 

_0 0 0 0 _ 

There are three variables because the augmented matrix has four columns. The 
associated system of equations is 

X\ — 5 x 3 — 1 

X 2 T X 3 = 4 (4) 

0 =0 

The variables X\ and X 2 corresponding to pivot columns in the matrix are called basic 
variables . 2 The other variable, X3, is called a free variable. 

Whenever a system is consistent, as in (4), the solution set can be described 
explicitly by solving the reduced system of equations for the basic variables in terms of 
the free variables. This operation is possible because the reduced echelon form places 
each basic variable in one and only one equation. In (4), solve the first equation for x\ 
and the second for X 2 . (Ignore the third equation; it offers no restriction on the variables.) 

|xi = 1 + 5x3 

< x 2 = 4-x 3 (5) 

X 3 is free 

The statement “X 3 is free” means that you are free to choose any value for X 3 . Once 
that is done, the formulas in (5) determine the values for x\ and X 2 . For instance, when 
X 3 = 0, the solution is (1,4, 0); when X 3 = 1, the solution is ( 6 , 3,1). Each different 
choice of X 3 determines a (different) solution of the system, and every solution of the 
system is determined by a choice of X3. 

EXAM PLE 4 Find the general solution of the linear system whose augmented ma¬ 
trix has been reduced to 

"1 6 2 -5 -2 -4" 

0 0 2 -8 -1 3 

_0 0 0 0 1 7_ 

SOLUTION The matrix is in echelon form, but we want the reduced echelon form 
before solving for the basic variables. The row reduction is completed next. The symbol 
~ before a matrix indicates that the matrix is row equivalent to the preceding matrix. 



"1 

6 

2 

-5 

-2 

-4" 


"1 

6 

2 

-5 

0 

10 


0 

0 

2 

-8 

-1 

3 


0 

0 

2 

-8 

0 

10 


0 

0 

0 

0 

1 

7 


0 

0 

0 

0 

1 

7 


"1 

6 

2 

-5 

0 

10 " 


"1 

6 

0 

3 

0 

0 " 


0 

0 

1 

-4 

0 

5 


0 

0 

1 

-4 

0 

5 


0 

0 

0 

0 

1 

7 


0 

0 

0 

0 

1 

7 


2 Some texts use the term leading variables because they correspond to the columns containing leading 
entries. 
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There are five variables because the augmented matrix has six columns. The associated 
system now is 

X\ T 6 x 2 T 3x4 = 0 

X 3 — 4 x 4 = 5 ( 6 ) 

x 5 = 7 


The pivot columns of the matrix are 1,3, and 5, so the basic variables are x \, X 3 , and X 5 . 
The remaining variables, X2 and X4, must be free. Solve for the basic variables to obtain 
the general solution: 

X| — 6x2 3 x 4 


< 


X 2 is free 
X 3 = 5 + 4 x 4 
X 4 is free 




Note that the value of X 5 is already fixed by the third equation in system ( 6 ). 


Parametric Descriptions of Solution Sets 

The descriptions in (5) and (7) are parametric descriptions of solution sets in which 
the free variables act as parameters. Solving a system amounts to finding a parametric 
description of the solution set or determining that the solution set is empty. 

Whenever a system is consistent and has free variables, the solution set has many 
parametric descriptions. For instance, in system (4), we may add 5 times equation 2 to 
equation 1 and obtain the equivalent system 

x\ + 5 x 2 = 21 

X 2 + X 3 = 4 

We could treat X 2 as a parameter and solve for x\ and X 3 in terms of X 2 , and we would 
have an accurate description of the solution set. However, to be consistent, we make the 
(arbitrary) convention of always using the free variables as the parameters for describing 
a solution set. (The answer section at the end of the text also reflects this convention.) 

Whenever a system is inconsistent, the solution set is empty, even when the system 
has free variables. In this case, the solution set has no parametric representation. 


Back-Substitution 

Consider the following system, whose augmented matrix is in echelon form but is not 
in reduced echelon form: 

xi — 7 x 2 + 2x3 — 5x4 + 8x5 = 10 

X 2 — 3x3 + 3x4 + X 5 = —5 

X 4 — X 5 = 4 

A computer program would solve this system by back-substitution, rather than by com¬ 
puting the reduced echelon form. That is, the program would solve equation 3 for X 4 in 
terms of X5 and substitute the expression for X4 into equation 2 , solve equation 2 for X2, 
and then substitute the expressions for X 2 and X 4 into equation 1 and solve for x\. 

Our matrix format for the backward phase of row reduction, which produces the re¬ 
duced echelon form, has the same number of arithmetic operations as back-substitution. 
But the discipline of the matrix format substantially reduces the likelihood of errors 




20 CHAPTER 1 Linear Equations in Linear Algebra 


during hand computations. The best strategy is to use only the reduced echelon form 
to solve a system! The Study Guide that accompanies this text offers several helpful 
suggestions for performing row operations accurately and rapidly. 

i— NUMERICAL NOTE - 

In general, the forward phase of row reduction takes much longer than the 
backward phase. An algorithm for solving a system is usually measured in flops 
(or floating point operations). A flop is one arithmetic operation (+,—,*,/) 
on two real floating point numbers . 3 For an n x (n + 1) matrix, the reduction 
to echelon form can take 2n 3 /3 + n 2 /2 — In/6 flops (which is approximately 
2n 3 /3 flops when n is moderately large —say, n > 30). In contrast, further 
reduction to reduced echelon form needs at most n 2 flops. 


Existence and Uniqueness Questions 

Although a nonreduced echelon form is a poor tool for solving a system, this form is 
just the right device for answering two fundamental questions posed in Section 1.1. 


EXAMPLE 5 Determine the existence and uniqueness of the solutions to the system 


3 x 2 — 6 x 3 + 6 x 4 + 4 x 5 = —5 

3xi — 7x2 + 8 x 3 — 5 x 4 + 8 x 5 = 9 
3xi — 9x2 + 12 x 3 — 9x4 + 6 x 5 = 15 


SOLUTION The augmented matrix of this system was row reduced in Example 3 to 


3 -9 12 -9 6 15 

0 2 -4 4 2 -6 

0 0 0 0 1 4 



The basic variables are xi, X 2 , and X5; the free variables are X3 and X4. There is no 
equation such as 0 = 1 that would indicate an inconsistent system, so we could use 
back-substitution to find a solution. But the existence of a solution is already clear in 
( 8 ). Also, the solution is not unique because there are free variables. Each different 
choice of X 3 and X 4 determines a different solution. Thus the system has infinitely many 
solutions. ■ 


When a system is in echelon form and contains no equation of the form 0 = b , with 
b nonzero, every nonzero equation contains a basic variable with a nonzero coefficient. 
Either the basic variables are completely determined (with no free variables) or at least 
one of the basic variables may be expressed in terms of one or more free variables. In 
the former case, there is a unique solution; in the latter case, there are infinitely many 
solutions (one for each choice of values for the free variables). 

These remarks justify the following theorem. 


3 Traditionally, a flop was only a multiplication or division, because addition and subtraction took much less 
time and could be ignored. The definition of flop given here is preferred now, as a result of advances in 
computer architecture. See Golub and Van Loan, Matrix Computations , 2nd ed. (Baltimore: The Johns 
Hopkins Press, 1989), pp. 19-20. 
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THEOREM 2 Existence and Uniqueness Theorem 

A linear system is consistent if and only if the rightmost column of the augmented 
matrix is not a pivot column—that is, if and only if an echelon form of the 
augmented matrix has no row of the form 

[0 • • • 0 b] with b nonzero 

If a linear system is consistent, then the solution set contains either (i) a unique 
solution, when there are no free variables, or (ii) infinitely many solutions, when 
there is at least one free variable. 


The following procedure outlines how to find and describe all solutions of a linear 
system. 


USING ROW REDUCTION TO SOLVE A LINEAR SYSTEM 

1. Write the augmented matrix of the system. 

2. Use the row reduction algorithm to obtain an equivalent augmented matrix in 
echelon form. Decide whether the system is consistent. If there is no solution, 
stop; otherwise, go to the next step. 

3. Continue row reduction to obtain the reduced echelon form. 

4 . Write the system of equations corresponding to the matrix obtained in step 3. 

5. Rewrite each nonzero equation from step 4 so that its one basic variable is 
expressed in terms of any free variables appearing in the equation. 


PRACTICE PROBLEMS 

1. Find the general solution of the linear system whose augmented matrix is 

"1-3-5 0" 

0 1 -1 -1 

2. Find the general solution of the system 

X\ — 2X2 — *3 + 3X4 = 0 

—2xi + 4x2 + 5x3 — 5x4 = 3 

3xi — 6 x 2 — 6 x 3 + 8 x 4 = 2 

3. Suppose a 4 x 7 coefficient matrix for a system of equations has 4 pivots. Is the 
system consistent? If the system is consistent, how many solutions are there? 


1.2 EXERCISES 


In Exercises 1 and 2, determine which matrices are in reduced 
echelon form and which others are only in echelon form. 


" 1 

0 

0 

0 " 


" 1 

0 

1 

0 " 

0 

1 

0 

0 

b. 

0 

1 

1 

0 

0 

0 

1 

1 


0 

0 

0 

1 


■ 1 

0 

0 

0 " 


" 1 

1 

0 

1 

1 " 

0 

1 

1 

0 

d. 

0 

2 

0 

2 

2 

0 

0 

0 

0 

0 

0 

0 

3 

3 

0 

0 

0 

1 


0 

0 

0 

0 

4 


1 . a. 
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" 1 

1 

0 

1 “ 


" 1 

1 

0 

0 " 


■ 

* 

>!< 

2. a. 

0 

0 

1 

1 

b. 

0 

1 

1 

0 

16. a. 

0 

■ 

* 


0 

0 

0 

0 


0 

0 

1 

1 


_0 

0 

0 _ 


d 


1 

1 

0 

0 

0 

0 

0 

0 


0 

1 

1 

0 

1 

0 

0 

0 


0 

0 

1 

1 

1 

2 

0 

0 


0 

0 

0 

1 

1 

2 

0 

0 


1 

2 

3 

0 


b. 


0 

0 


* 

0 

0 


* 


* 

>!< 


0 


>!< 

>!< 

* 


In Exercises 17 and 18, determine the value(s) of h such that the 
matrix is the augmented matrix of a consistent linear system. 


17. 


2 

4 


3 

6 


h 

7 


18. 


1 -3 -2 

5 h -7 


Row reduce the matrices in Exercises 3 and 4 to reduced echelon 
form. Circle the pivot positions in the final matrix and in the 
original matrix, and list the pivot columns. 



i_ 

2 

3 

4 


i_ 

3 

5 

i 

3. 

4 

5 

6 

7 

4. 

3 

5 

7 

9 


VO 

_ 1 

7 

OO 

so 

1 _ 


kn 

_ 1 

7 

9 

1 


5. Describe the possible echelon forms of a nonzero 2x2 
matrix. Use the symbols ■, *, and 0, as in the first part of 
Example 1. 

6 . Repeat Exercise 5 for a nonzero 3x2 matrix. 


Find the general solutions of the systems whose augmented ma¬ 
trices are given in Exercises 7-14. 


7. 

9. 











1 

0 

0 

0 

1 

0 

0 

0 


-4 2 

12 -6 
8 -4 









Exercises 15 and 16 use the notation of Example 1 for matrices 
in echelon form. Suppose each matrix represents the augmented 
matrix for a system of linear equations. In each case, determine if 
the system is consistent. If the system is consistent, determine if 


the solution is 

unique 

• 




■ 

* 

>!< 

>!< 


15. a. 

0 

■ 

* 

* 



_0 

0 

■ 

0 _ 



"0 

■ 

* 

* 

* 

b. 

0 

0 

■ 

* 

* 


0 

0 

0 

0 

■ 


In Exercises 19 and 20, choose h and k such that the system has 
(a) no solution, (b) a unique solution, and (c) many solutions. Give 
separate answers for each part. 

19 . X\ + hx 2 = 2 20 . x\ + 3x 2 = 2 

4xi + 8 x 2 = k 3x\ + hx 2 = k 

In Exercises 21 and 22, mark each statement True or False. Justify 
each answer . 4 

21 . a. In some cases, a matrix may be row reduced to more 

than one matrix in reduced echelon form, using different 
sequences of row operations. 

b. The row reduction algorithm applies only to augmented 
matrices for a linear system. 

c. A basic variable in a linear system is a variable that 
corresponds to a pivot column in the coefficient matrix. 

d. Finding a parametric description of the solution set of a 
linear system is the same as solving the system. 

e. If one row in an echelon form of an augmented matrix 
is [ 0 0 0 5 0 ], then the associated linear system is 
inconsistent. 

22 . a. The echelon form of a matrix is unique. 

b. The pivot positions in a matrix depend on whether row 
interchanges are used in the row reduction process. 

c. Reducing a matrix to echelon form is called th q forward 
phase of the row reduction process. 

d. Whenever a system has free variables, the solution set 
contains many solutions. 

e. A general solution of a system is an explicit description 
of all solutions of the system. 

23 . Suppose a 3 x 5 coefficient matrix for a system has three 

pivot columns. Is the system consistent? Why or why not? 

24 . Suppose a system of linear equations has a 3 x 5 augmented 

matrix whose fifth column is a pivot column. Is the system 

consistent? Why (or why not)? 


4 True/false questions of this type will appear in many sections. Methods 
for justifying your answers were described before Exercises 23 and 24 in 
Section 1.1. 
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25. Suppose the coefficient matrix of a system of linear equations 
has a pivot position in every row. Explain why the system is 
consistent. 

26. Suppose the coefficient matrix of a linear system of three 
equations in three variables has a pivot in each column. 
Explain why the system has a unique solution. 

27. Restate the last sentence in Theorem 2 using the concept 

of pivot columns: “If a linear system is consistent, then the 
solution is unique if and only if_” 

28. What would you have to know about the pivot columns in an 
augmented matrix in order to know that the linear system is 
consistent and has a unique solution? 

29. A system of linear equations with fewer equations than 
unknowns is sometimes called an under determined system. 
Suppose that such a system happens to be consistent. Explain 
why there must be an infinite number of solutions. 

30. Give an example of an inconsistent underdetermined system 
of two equations in three unknowns. 

31. A system of linear equations with more equations than un¬ 
knowns is sometimes called an over determined system. Can 
such a system be consistent? Illustrate your answer with a 
specific system of three equations in two unknowns. 

32. Suppose an n x (n + 1) matrix is row reduced to reduced 
echelon form. Approximately what fraction of the total num¬ 
ber of operations (flops) is involved in the backward phase of 
the reduction when n — 30? when n — 300? 

Suppose experimental data are represented by a set of points 

in the plane. An interpolating polynomial for the data is a 


polynomial whose graph passes through every point. In scientific 
work, such a polynomial can be used, for example, to estimate 
values between the known data points. Another use is to create 
curves for graphical images on a computer screen. One method for 
finding an interpolating polynomial is to solve a system of linear 
equations. 

WEB 

33. Find the interpolating polynomial p(t) = ao + a\t + a 2 t 2 
for the data (1,12), (2,15), (3,16). That is, find a 0 , a ls and 
a 2 such that 

CLq + #i(l) + a 2 {X) 2 = 12 
Oq + (2) T u 2 (2^)~ — 15 
CLq T £jq(3) T $ 2 ( 3 )"" — 16 

34. [M] In a wind tunnel experiment, the force on a projectile 
due to air resistance was measured at different velocities: 

Velocity (100 ft/sec) 0 2 4 6 8 10 

Force (100 lb) 0 2.90 14.8 39.6 74.3 119 

Find an interpolating polynomial for these data and estimate 
the force on the projectile when the projectile is travel¬ 
ing at 750 ft/sec. Use p(t) = a 0 + ad + a 2 t 2 + a 3 t 3 + a 4 t 4 
+ a 5 t 5 . What happens if you try to use a polynomial of degree 
less than 5? (Try a cubic polynomial, for instance.) 5 


5 Exercises marked with the symbol [M] are designed to be worked 
with the aid of a “Matrix program” (a computer program, such as 
MATLAB, Maple, Mathematica, MathCad, or Derive, or a 
programmable calculator with matrix capabilities, such as those 
manufactured by Texas Instruments or Hewlett-Packard). 


SOLUTIONS TO PRACTICE PROBLEMS 



The general solution of the 
system of equations is the line of 
intersection of the two planes. 


1. The reduced echelon form of the augmented matrix and the corresponding system 
are 


10-8-3 

01 - 1-1 




3 

1 


The basic variables are x\ and X 2 , and the general solution is 


J x\ = — 3 + 8 x 3 
*2 = -1 + *3 
X 3 is free 

Note: It is essential that the general solution describe each variable, with any param¬ 
eters clearly identified. The following statement does not describe the solution: 


Ixi = — 3 + 8x3 

X2 = — 1 + X3 

X 3 = 1 + X 2 Incorrect solution 


This description implies that X 2 and X 3 are both free, which certainly is not the case. 
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2 . Row reduce the system’s augmented matrix: 







0 

3 

2 

0 

3 

5 


This echelon matrix shows that the system is inconsistent , because its rightmost 
column is a pivot column; the third row corresponds to the equation 0 = 5. There 
is no need to perform any more row operations. Note that the presence of the free 
variables in this problem is irrelevant because the system is inconsistent. 

3. Since the coefficient matrix has four pivots, there is a pivot in every row of the 
coefficient matrix. This means that when the coefficient matrix is row reduced, it 
will not have a row of zeros, thus the corresponding row reduced augmented matrix 
can never have a row of the form [0 0 • • • Ob], where b is a nonzero number. By 
Theorem 2, the system is consistent. Moreover, since there are seven columns in 
the coefficient matrix and only four pivot columns, there will be three free variables 
resulting in infinitely many solutions. 


1.3 VECTOR EQUATIONS 


Important properties of linear systems can be described with the concept and notation 
of vectors. This section connects equations involving vectors to ordinary systems of 
equations. The term vector appears in a variety of mathematical and physical contexts, 
which we will discuss in Chapter 4, “Vector Spaces.” Until then, vector will mean an 
ordered list of numbers. This simple idea enables us to get to interesting and important 
applications as quickly as possible. 


Vectors in IR 2 


A matrix with only one column is called a column vector , or simply a vector . Examples 
of vectors with two entries are 





w i 

w 2 


where W\ and w 2 are any real numbers. The set of all vectors with two entries is denoted 
by R 2 (read “r-two”). The R stands for the real numbers that appear as entries in the 
vectors, and the exponent 2 indicates that each vector contains two entries. 1 

Two vectors in R 2 are equal if and only if their corresponding entries are equal. 


Thus 


4 

7 


numbers. 


and 


7 

4 


are not equal, because vectors in R 2 are ordered pairs of real 


1 Most of the text concerns vectors and matrices that have only real entries. However, all definitions and 
theorems in Chapters 1-5, and in most of the rest of the text, remain valid if the entries are complex 
numbers. Complex vectors and matrices arise naturally, for example, in electrical engineering and physics. 
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Given two vectors u and v in R 2 , their sum is the vector u + v obtained by adding 
corresponding entries of u and v. For example, 


r 

+ 

"2" 


1+2" 


"3" 

-2 

5 


_ -2 + 5 _ 


3 


Given a vector u and a real number c , the scalar multiple of u by c is the vector c u 
obtained by multiplying each entry in u by c. For instance, 


if u 


3 

1 


and c = 5, 


then cu = 5 


3" 


i 

to 

_1 

1- 

1_ 


1 

Lh 

1_ 


The number c in cu is called a scalar; it is written in lightface type to distinguish it from 
the boldface vector u. 

The operations of scalar multiplication and vector addition can be combined, as in 
the following example. 


EXAMPLE 1 Given u 


1 

2 


and v 


2 

5 


, find 4u, (— 3)v, and 4u + (— 3)v 


SOLUTION 


4u 


4 

8 


(—3)v 


-6 

15 


and 


4u + (—3)v 


4 

8 


+ 


-6 

15 


2 

7 


Sometimes, for convenience (and also to save space), this text may write a column 

3 

in the form (3,-1). In this case, the parentheses and the comma 


vector such as 


1 


distinguish the vector (3, 
and no comma. Thus 


1) from the 1x2 row matrix 3 — 1 ], written with brackets 


3 

1 


/[3 -1] 


because the matrices have different shapes, even though they have the same entries. 


Geometric Descriptions of M 2 


Consider a rectangular coordinate system in the plane. Because each point in the plane 
is determined by an ordered pair of numbers, we can identify a geometric point (< 2 , b ) 


with the column vector 
See Figure 1. 


a 

b 


. So we may regard R 2 as the set of all points in the plane 


X. 


2 





1 



FIGURE 1 Vectors as points. 


FIGURE 2 Vectors with arrows. 
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The geometric visualization of a vector such as 



is often aided by including an 


arrow (directed line segment) from the origin (0,0) to the point (3, —1), as in Figure 2. 
In this case, the individual points along the arrow itself have no special significance. 2 

The sum of two vectors has a useful geometric representation. The following rule 
can be verified by analytic geometry. 


Parallelogram Rule for Addition 

If u and v in M 2 are represented as points in the plane, then u + v corresponds to 
the fourth vertex of the parallelogram whose other vertices are u, 0, and v. See 
Figure 3. 



FIGURE 3 The parallelogram rule. 


EXAMPLE 2 

in Figure 4. 


The vectors u = 




, and u + v 



are displayed 



FIGURE 4 


The next example illustrates the fact that the set of all scalar multiples of one fixed 
nonzero vector is a line through the origin, (0, 0). 

. Display the vectors u, 2u, and — |u on a graph. 

SOLUTION See Figure 5, where u, 2u = 

The arrow for 2u is twice as long as the arrow for u, and the arrows point in the same 
direction. The arrow for — | u is two-thirds the length of the arrow for u, and the arrows 
point in opposite directions. In general, the length of the arrow for cu is \c\ times the 
length of the arrow for u. [Recall that the length of the line segment from (0, 0) to (a, b ) 
is \J a 1 + b 2 . We shall discuss this further in Chapter 6.] 


, and 


u 


-2 

2/3 


are displayed 


EXAMPLE 3 Letu = 


2 In physics, arrows can represent forces and usually are free to move about in space. This interpretation of 
vectors will be discussed in Section 4.1. 
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Typical multiples of u 


The set of all multiples of u 


FIGURE 5 



FIGURE 6 

Scalar multiples. 


Vectors in IR 3 

Vectors in M 3 are 3x1 column matrices with three entries. They are represented ge¬ 
ometrically by points in a three-dimensional coordinate space, with arrows from the 

' 2 ' 


origin sometimes included for visual clarity. The vectors a 
in Figure 6. 


3 

4 


and 2a are displayed 


Vectors in M" 


If n is a positive integer, W 1 (read “r-n”) denotes the collection of all lists (or ordered 
n-tuples) of n real numbers, usually written as n x 1 column matrices, such as 






The vector whose entries are all zero is called the zero vector and is denoted by 0. 
(The number of entries in 0 will be clear from the context.) 

Equality of vectors in W 1 and the operations of scalar multiplication and vector 
addition in W 1 are defined entry by entry just as in M 2 . These operations on vectors 
have the following properties, which can be verified directly from the corresponding 
properties for real numbers. See Practice Problem 1 and Exercises 33 and 34 at the end 
of this section. 




Algebraic Properties of 

For all u, v, w in W 1 and all scalars c and d : 


(i) 

U + V 

= v + u 

(V) 

c( u + v) 

= cu + cv 

(ii) 

(u T v) T w = u T (v T w) 

(Vi) 

(c + d) u 

= c u + du 

(iii) 

u + 0 

= o + u = u 

(vii) 

c(d u) = 

(cd)u 

(iv) 

U + (- 

u) = —u + u = 0, 

(viii) 

lu = u 



where 

— u denotes (— l)u 





FIGURE 7 

Vector subtraction. 


For simplicity of notation, a vector such as u + (—l)v is often written as u — v. 
Figure 7 shows u — v as the sum of u and — v. 
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Linear Combinations 


Given vectors Vi , v 2 , ..., \ p in M ;? and given scalars c\ , c 2 ,..., c p , the vector y defined 
by 


y = C 1 V 1 H-h c p \p 

is called a linear combination of Vi ,..., with weights c\,... ,c p . Property (ii) above 
permits us to omit parentheses when forming such a linear combination. The weights in 
a linear combination can be any real numbers, including zero. For example, some linear 
combinations of vectors Vi and V 2 are 

V3 vi + v 2 , \\\ (= \\\ + 0v 2 ), and 0 (= Ovi + 0v 2 ) 


EXAMPLE 4 


Figure 8 identifies selected linear combinations of Vi = 




(Note that sets of parallel grid lines are drawn through integer multiples of 


Vi and v 2 .) Estimate the linear combinations of Vi and v 2 that generate the vectors u and 
w. 



FIGURE 8 Linear combinations of Vi and v 2 . 



FIGURE 9 


SOLUTION The parallelogram rule shows that u is the sum of 3vi and — 2v 2 ; that is, 

u = 3vi — 2 v 2 

This expression for u can be interpreted as instructions for traveling from the origin to u 
along two straight paths. First, travel 3 units in the Vi direction to 3vi , and then travel —2 
units in the v 2 direction (parallel to the line through v 2 and 0). Next, although the vector 
w is not on a grid line, w appears to be about halfway between two pairs of grid lines, 
at the vertex of a parallelogram determined by (5/2)' Vi and (— l/2)v 2 . (See Figure 9.) 
Thus a reasonable estimate for w is 

w = fvi - 5V2 ■ 

The next example connects a problem about linear combinations to the fundamental 
existence question studied in Sections 1.1 and 1.2. 



r 


"2" 


7" 

EXAMPLE 5 Letai = 

-2 

-5 

, a 2 = 

5 

6 

, and b = 

4 

-3 


. Determine whether 


b can be generated (or written) as a linear combination of ai and a 2 . That is, determine 
whether weights x\ and x 2 exist such that 


x\ 2 i\ + x 2 a 2 = b 


If vector equation (1) has a solution, find it. 


( 1 ) 
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SOLUTION Use the definitions of scalar multiplication and vector addition to rewrite 
the vector equation 


which is the same as 


and 



r 



"2" 


7" 

*1 

-2 

+ X 2 

5 

— 

4 


-5 



6 


-3 


t 



t 

t 


ai 



a 2 

b 

— 

Xi 


2x 2 


7" 


-2x\ 

+ 

5x 2 

— 

4 


-5xi 


_ 6x 2 _ 


-3 


X\ + 2X2 


7 

— 2 xi + 5x2 

— 

4 

—5xi + 6 x 2 


-3 


( 2 ) 


The vectors on the left and right sides of (2) are equal if and only if their corresponding 
entries are both equal. That is, x\ and x 2 make the vector equation (1) true if and only 
if X\ and X 2 satisfy the system 


x\ + 2x2 = 2 

—2xi + 5x2 = 4 (3) 

—5x 1 + 6 x 2 = —3 

To solve this system, row reduce the augmented matrix of the system as follows: 3 


1 

2 

-a 

_1 


"1 

2 

_1 


"1 

2 

_1 


"1 

0 

1 

m 

-2 

5 

4 


0 

9 

18 


0 

1 

2 


0 

1 

2 

1— 

6 

1_ 


0 

16 

32 


0 

16 

32 


0 

0 

0 


The solution of (3) is x\ = 3 and X 2 = 2. Hence b is a linear combination of ai and a 2 , 
with weights X\ = 3 and X 2 = 2. That is, 



r 


"2" 


7 


3 

-2 

+2 

5 

— 

4 

■ 


-5 


6 


-3 



Observe in Example 5 that the original vectors ai, a 2 , and b are the columns of the 
augmented matrix that we row reduced: 

"12 7" 

-2 5 4 

_ —5 6 —3 _ 

t t t 

ai a 2 b 

For brevity, write this matrix in a way that identifies its columns 

[ai a 2 b] 

It is clear how to write this augmented matrix immediately from vector equation (1), 
without going through the intermediate steps of Example 5. Take the vectors in the 
order in which they appear in (1) and put them into the columns of a matrix as in (4). 
The discussion above is easily modified to establish the following fundamental fact. 


namely, 


(4) 


3 The symbol ~ between matrices denotes row equivalence (Section 1.2). 
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DEFINITION 


A vector equation 

Xi&\ + X2&2 + ‘ ‘ ‘ + X n SL n = b 

has the same solution set as the linear system whose augmented matrix is 

_ai a 2 ••• a„ b] (5) 

In particular, b can be generated by a linear combination of ai,..., a n if and only 
if there exists a solution to the linear system corresponding to the matrix (5). 


One of the key ideas in linear algebra is to study the set of all vectors that can be 
generated or written as a linear combination of a fixed set {vi,..., v^} of vectors. 


If vi ,,y p are in M /z , then the set of all linear combinations of \\,... ,x p 
is denoted by Span{vi,..., v^} and is called the subset of M /? spanned (or 
generated) by vi,..., \ p . That is, Span {\\,... ,x p }is the collection of all vectors 
that can be written in the form 

C\V 1 + c 2 \2 H-F c p x p 

with c\,... ,c p scalars. 


Asking whether a vector b is in Span {vi,..., v^} amounts to asking whether the 
vector equation 


xivi + x 2 V 2 H-h x p x p = b 


has a solution, or, equivalently, asking whether the linear system with augmented matrix 
[ vi • • • \ p b ] has a solution. 

Note that Span {vi,..., v^} contains every scalar multiple of Vi (for exam¬ 
ple), since c\\ = c\\ + OV 2 + ••• + Ov^. In particular, the zero vector must be in 
Span {vi,... ,v„}. 


A Geometric Description of Span{v} and Span{u, v} 


Let v be a nonzero vector in M 3 . Then Span{v} is the set of all scalar multiples of v, 
which is the set of points on the line in M 3 through v and 0. See Figure 10. 

If u and v are nonzero vectors in M 3 , with v not a multiple of u, then Span {u, v} is 
the plane in M 3 that contains u, v, and 0. In particular, Span{u, v} contains the line in 
M 3 through u and 0 and the line through v and 0. See Figure 11. 



FIGURE 10 Span {v} as a 
line through the origin. 




FIGURE 11 Span{u,v}asa 
plane through the origin. 
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1 


5" 


-3 

EXAMPLE 6 Let ai = 

1 

1 _ 

, a 2 = 

-13 

-3 

, and b = 

8 

1 


Span {ai, a 2 } is a plane through the origin in M 3 . Is b in that plane? 


Then 


SOLUTION Does the equation X\a\ + X2a2 = b have a solution? To answer this, row 
reduce the augmented matrix [ ai a 2 b ]: 


1 

5 

_1 


"1 

5 

_1 


"1 

5 

_1 

-2 

-13 

00 


0 

-3 

2 


0 

-3 

2 

3 

-3 

1 


0 

00 

10 


0 

0 

—1 

<N 


The third equation is 0 = —2, which shows that the system has no solution. The vector 
equation x\a\ + X 2 a 2 = b has no solution, and so b is not in Span {ai, a 2 }. ■ 


Linear Combinations in Applications 

The final example shows how scalar multiples and linear combinations can arise when 
a quantity such as “cost” is broken down into several categories. The basic principle for 
the example concerns the cost of producing several units of an item when the cost per 
unit is known: 

( number) ( cost ) _ (total [ 

1 of units ( ) per unit ( | cost ( 


EXAMPLE 7 A company manufactures two products. For $1.00 worth of product 
B, the company spends $.45 on materials, $.25 on labor, and $.15 on overhead. For $1.00 
worth of product C, the company spends $.40 on materials, $.30 on labor, and $.15 on 
overhead. Let 



.45 


.40 

b = 

.25 

and c = 

.30 


.15 


.15 


Then b and c represent the “costs per dollar of income” for the two products. 

a. What economic interpretation can be given to the vector 100b? 

b. Suppose the company wishes to manufacture X\ dollars worth of product B and 
X 2 dollars worth of product C. Give a vector that describes the various costs the 
company will have (for materials, labor, and overhead). 

SOLUTION 

a. Compute 



".45" 


"45" 

100b = 100 

.25 

— 

25 


.15 


15 


The vector 100b lists the various costs for producing $100 worth of product B — 
namely, $45 for materials, $25 for labor, and $15 for overhead. 

b. The costs of manufacturing X\ dollars worth of B are given by the vector Xib, and 
the costs of manufacturing x^ dollars worth of C are given by X 2 C. Hence the total 
costs for both products are given by the vector xib + X 2 C. ■ 
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PRACTICE PROBLEMS 


1 . 

2 . 


Prove that u + v = v + u for any u and v in M 


n 


For what value(s) of h will y be in Span{vi, V 2 , V 3 } if 



r 


5 


-3 


-4 

Vl = 

-1 

-2 

, V 2 = 

-4 

-7 

, v 3 = 

1 

0 

, and y = 

3 

h 


3 . Let wi, W2, W3, u, and v be vectors in W 1 . Suppose the vectors u and v are in Span 
{wi, W2, W3}. Show that u + v is also in Span {wi, W2, W3}. [Hint: The solution to 
Practice Problem 3 requires the use of the definition of the span of a set of vectors. 
It is useful to review this definition on Page 30 before starting this exercise.] 


1.3 EXERCISES 


In Exercises 1 and 2 , compute u + v and u — 2 v. 


1. u = 



2. u = 




In Exercises 9 and 10, write a vector equation that is equivalent to 
the given system of equations. 

9. x 2 T 5 x 3 — 0 10. Ax\ + X 2 T 3 x 3 — 9 

4xi + 6 x 2 — x 3 = 0 X] — 7x 2 — 2x 3 = 2 

—X] + 3x 2 ~ 8 x 3 = 0 8 x 1 + 6 x 2 — 5x 3 = 15 


In Exercises 3 and 4, display the following vectors using arrows 
on an xy-graph: u, v, —v, — 2 v, u + v, u — v, and u — 2 v. Notice 
that u — v is the vertex of a parallelogram whose other vertices are 
u, 0, and —v. 

3. u and v as in Exercise 1 4. u and v as in Exercise 2 

In Exercises 5 and 6, write a system of equations that is equivalent 
to the given vector equation. 
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Use the accompanying figure to write each vector listed in Exer¬ 
cises 7 and 8 as a linear combination of u and v. Is every vector 
in M 2 a linear combination of u and v? 



7. Vectors a, b, c, and d 

8. Vectors w, x, y, and z 


In Exercises 11 and 12, determine if b is a linear combination of 
ai, a 2 , and a 3 . 
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In Exercises 13 and 14, determine if b is a linear combination of 
the vectors formed from the columns of the matrix A . 
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In Exercises 15 and 16, list five vectors in Span {vi, v 2 }. For each 
vector, show the weights on Vi and v 2 used to generate the vector 
and list the three entries of the vector. Do not make a sketch. 
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17. Let ai = 

4 

-2 

, a 2 = 

-3 

7 

, and b = 

1 

h 

. Lor what 


value(s) of h is b in the plane spanned by st\ and a 2 ? 
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18. Let Vi = 

i 

O <N 

_1 

, v 2 = 

1 

8 

, and y = 

i 

U> Ui 

1 _ 

. Lor what 


value(s) of h is y in the plane generated by V! and v 2 ? 

19. Give a geometric description of Span {vi, v 2 } for the vectors 
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Vi = 
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and v 2 = 

3 
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20. Give a geometric description of Span {vi, v 2 } for the vectors 
in Exercise 16. 


21. Let u = 


2 

1 


and v = 


2 

1 


Show that 


h 

k 


is in 


Span {u, v} for all h and k . 


22. Construct a 3 x 3 matrix A , with nonzero entries, and a vector 
b in R 3 such that b is not in the set spanned by the columns 
of A. 


a. Is b in {ai, a 2 , a 3 }? How many vectors are in {ai, a 2 , a 3 }? 

b. Is b in W? How many vectors are in IE? 

c. Show that ai is in W. [Hint: Row operations are unnec¬ 
essary.] 


26. Let A = 



_ i 


10 

5 

, let b = 

3 

i_ 


i 

1 _ 


and let W be 


the set of all linear combinations of the columns of A . 


a. Is b in W? 

b. Show that the third column of A is in W. 



A mining company has two mines. One day’s operation at 
mine #1 produces ore that contains 20 metric tons of cop¬ 
per and 550 kilograms of silver, while one day’s operation 
at mine #2 produces ore that contains 30 metric tons of 


copper and 500 kilograms of silver. Let Vi 



and 




. Then and v 2 represent the “output per day” 


of mine #1 and mine #2, respectively. 


In Exercises 23 and 24, mark each statement True or Lalse. Justify 
each answer. 


23. a. Another notation for the vector 



is [ —4 3 ]. 


b. The points in the plane corresponding to 



lie on a line through the origin. 



and 


c. An example of a linear combination of vectors Vi and v 2 
is the vector 

d. The solution set of the linear system whose augmented 
matrix is [ ai a 2 a 3 b ] is the same as the solution 
set of the equation x\2l\ + x 2 a 2 + x 3 a 3 = b. 


e. The set Span{u, v} is always visualized as a plane 
through the origin. 


24. a. 

b. 


Any list of five real numbers is a vector in R 5 . 


The vector u results when a vector u — v is added to the 
vector v. 


c. The weights C\,...,c p in a linear combination 
C\\\ + • • • + c p \p cannot all be zero. 

d. When u and v are nonzero vectors, Span {u, v} contains 
the line through u and the origin. 


e. Asking whether the linear system corresponding to 
an augmented matrix [ ai a 2 a 3 b ] has a solution 
amounts to asking whether b is in Span {ai, a 2 , a 3 }. 



1 

1 

o 


4" 


25. Let A ~ 

O <N 

_1 
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6 3 _ 

and b = 

1 

-4 

. Denote the 


columns of A by ai , a 2 , a 3 , and let W = Span {ai , a 2 , a 3 }. 


a. What physical interpretation can be given to the vector 
5V!? 

b. Suppose the company operates mine #1 for X\ days and 
mine #2 for x 2 days. Write a vector equation whose solu¬ 
tion gives the number of days each mine should operate in 
order to produce 150 tons of copper and 2825 kilograms 
of silver. Do not solve the equation. 

c. [M] Solve the equation in (b). 

28. A steam plant burns two types of coal: anthracite (A) and 
bituminous (B). Lor each ton of A burned, the plant produces 
27.6 million Btu of heat, 3100 grams (g) of sulfur dioxide, 
and 250 g of particulate matter (solid-particle pollutants). Lor 
each ton of B burned, the plant produces 30.2 million Btu, 
6400 g of sulfur dioxide, and 360 g of particulate matter. 


a. How much heat does the steam plant produce when it 
burns x\ tons of A and x 2 tons of B? 

b. Suppose the output of the steam plant is described by 
a vector that lists the amounts of heat, sulfur dioxide, 
and particulate matter. Express this output as a linear 
combination of two vectors, assuming that the plant burns 
Xi tons of A and x 2 tons of B. 


29. 


c. [M] Over a certain time period, the steam plant produced 
162 million Btu of heat, 23,610 g of sulfur dioxide, and 
1623 g of particulate matter. Determine how many tons 
of each type of coal the steam plant must have burned. 
Include a vector equation as part of your solution. 

Let vi,...,Vfc be points in R 3 and suppose that for 
j = 1 ,..., k an object with mass mj is located at point v 7 . 
Physicists call such objects point masses. The total mass of 
the system of point masses is 


m = m i + • • • + mk 
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The center of gravity (or center of mass) of the system is 

v = — [mi\ x H - h m k \ k ] 

m 

Compute the center of gravity of the system consisting of the 
following point masses (see the figure): 


Point 

Mass 

V! = (5,-4, 3) 

2g 

v 2 = (4, 3, -2) 

5g 

v 3 = (-4,-3,-1) 

2g 

v 4 = (-9, 8, 6) 

1 g 




30. Let v be the center of mass of a system of point 
masses located at \\,...,\ k as in Exercise 29. Is v in 
Span {vi,..., Vfc}? Explain. 

31. A thin triangular plate of uniform density and thickness has 
vertices at Vi = (0,1), v 2 = (8,1), and v 3 = (2, 4), as in the 
figure below, and the mass of the plate is 3 g. 



a. Find the (x, y )-coordinates of the center of mass of the 
plate. This “balance point” of the plate coincides with 
the center of mass of a system consisting of three 1-gram 
point masses located at the vertices of the plate. 

b. Determine how to distribute an additional mass of 6 g 
at the three vertices of the plate to move the balance 
point of the plate to (2,2). [Hint: Let uq, w 2 , and w 3 
denote the masses added at the three vertices, so that 

Wi + w 2 + w 3 = 6 .] 

32. Consider the vectors Vi, v 2 , v 3 , and b in IR 2 , shown in the 
figure. Does the equation xiVi + x 2 \ 2 + x 3 v 3 = b have a 
solution? Is the solution unique? Use the figure to explain 
your answers. 



33. Use the vectors u = (wi,..., u„), v = (iq,..., v n ), and 
w = (uq,..., w n ) to verify the following algebraic proper¬ 
ties of M". 

a. (u + v) + w = u + (v + w) 

b. c(u + v) = cu + cv for each scalar c 

34. Use the vector u = (u i,..., u n ) to verify the following alge¬ 
braic properties of IR 77 . 

a. u + (— u) = (— u) + u = 0 

b. c(du) = (cd)u for all scalars c and d 


SOLUTIONS TO PRACTICE PROBLEMS 



intersects the plane when h = 5. 



Take arbitrary vectors u = (u \,..., u n ) and v 

U + V = (U\ + V\, ..., u n + v n ) 

= Oi + Ml, ...,V n + Un) 

= V + U 


(tq,..., v n ) in W l , and compute 


Definition of vector addition 


Commutativity of addition in 



Definition of vector addition 


2 . The vector y belongs to Span {vi, V2, V3} if and only if there exist scalars X\,X2, X3 
such that 
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This vector equation is equivalent to a system of three linear equations in three 
unknowns. If you row reduce the augmented matrix for this system, you find that 
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The system is consistent if and only if there is no pivot in the fourth column. That 
is, h — 5 must be 0. So y is in Span{vi, V 2 , V 3 } if and only if h = 5. 

Remember: The presence of a free variable in a system does not guarantee that the 
system is consistent. 

3 . Since the vectors u and v are in Span {wi, W2, W3}, there exist scalars c\ , C2, and 
d\,d 2 ,d 3 such that 

u = c\ wi + C 2 W2 + C3 W3 and v = d\ wi + J 2 W2 + ^3 W3. 

Notice 


U + V = C\W\ + C2W2 + C3W3 + d 1W1 + d2^2 + ^3W3 

= (c 1 + d\) w 1 + (c 2 + d 2 ) w 2 + (c 3 + J 3 ) w 3 

Since c\ + d\, C2 + d2, and C3 + d 3 are also scalars, the vector u + v is in Span 
{w 1 , w 2 , w 3 }. 


1.4 THE MATRIX EQUATION Ax = b 

A fundamental idea in linear algebra is to view a linear combination of vectors as the 
product of a matrix and a vector. The following definition permits us to rephrase some 
of the concepts of Section 1.3 in new ways. 


DEFINITION 


If A is an m x n matrix, with columns sl\, ..., a w , and if x is in W 1 , then the 

product of A and x, denoted by Ax, is the linear combination of the columns 
of A using the corresponding entries in x as weights; that is, 



Note that Ax is defined only if the number of columns of A equals the number of entries 
in x. 


EXAMPLE 1 
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EXAMPLE 2 For Vi, v 2 , V3 in W n , write the linear combination 3 vi — 5v2 + 7v3 as 
a matrix times a vector. 
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SOLUTION Place Vi, v 2 , v 3 into the columns of a matrix A and place the weights 3,-5, 
and 7 into a vector x. That is, 


3vi — 5 v2 + 7v 3 = [vi 


v 2 


V 3 ] 


3 

5 

7 


Ax 


Section 1.3 showed how to write a system of linear equations as a vector equation 
involving a linear combination of vectors. For example, the system 


x\ + 2x 2 — x 3 = 4 
—5x 2 + 3x 3 = 1 



is equivalent to 
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-1 


4 

Xi 

0 

+ X 2 

-5 

+ x$ 

3 

— 

1 



As in Example 2, the linear combination on the left side is a matrix times a vector, so 
that (2) becomes 




Equation (3) has the form Ax = b. Such an equation is called a matrix equation, 
to distinguish it from a vector equation such as is shown in (2). 

Notice how the matrix in (3) is just the matrix of coefficients of the system (1). 
Similar calculations show that any system of linear equations, or any vector equation 
such as (2), can be written as an equivalent matrix equation in the form Ax = b. This 
simple observation will be used repeatedly throughout the text. 

Here is the formal result. 


THEOREM 3 


If A is an m x n matrix, with columns ai,..., a n , 
equation 


and if b is in M 


m 



the matrix 



has the same solution set as the vector equation 

xiai + x 2 a 2 H-E x n a n = b (5) 


which, in turn, has the same solution set as the system of linear equations whose 
augmented matrix is 

[ai a 2 ••• a„ b] (6) 


Theorem 3 provides a powerful tool for gaining insight into problems in linear 
algebra, because a system of linear equations may now be viewed in three different 
but equivalent ways: as a matrix equation, as a vector equation, or as a system of linear 
equations. Whenever you construct a mathematical model of a problem in real life, you 
are free to choose whichever viewpoint is most natural. Then you may switch from one 
formulation of a problem to another whenever it is convenient. In any case, the matrix 
equation (4), the vector equation (5), and the system of equations are all solved in the 
same way—by row reducing the augmented matrix (6). Other methods of solution will 
be discussed later. 
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FIGURE 1 

The columns of 

A = [sli a 2 a 3 ] span a plane 
through 0 . 


THEOREM 4 


Existence of Solutions 

The definition of Ax leads directly to the following useful fact. 


The equation Ax = b has a solution if and only if b is a linear combination of the 
columns of A . 


Section 1.3 considered the existence question, “Is b in Span {ai,..., a /7 }?” Equiva¬ 
lently, “Is Ax = b consistent?” A harder existence problem is to determine whether the 
equation Ax = b is consistent for all possible b. 
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consistent for all possible b\,b 2 ,b 3 l 


. Is the equation Ax 



SOLUTION Row reduce the augmented matrix for Ax = b: 
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14 
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0 

0 

0 

b 3 + 3 b\ - 

- \{b2 + 4b\) 


The third entry in column 4 equals b\ — |Z ?2 + b 3 . The equation Ax = b is not consistent 
for every b because some choices of b can make b\ — ^2 + b 3 nonzero. ■ 


The reduced matrix in Example 3 provides a description of all b for which the 
equation Ax = b is consistent: The entries in b must satisfy 

b\ - hb 2 + bi = 0 


This is the equation of a plane through the origin in M 3 . The plane is the set of all linear 
combinations of the three columns of A . See Figure 1. 

The equation Ax = b in Example 3 fails to be consistent for all b because the 
echelon form of A has a row of zeros. If A had a pivot in all three rows, we would 
not care about the calculations in the augmented column because in this case an echelon 
form of the augmented matrix could not have a row such as [ 0 0 0 1 ]. 

In the next theorem, the sentence “The columns of A span M 777 ” means that every b in 
R. 777 is a linear combination of the columns of A. In general, a set of vectors {vi,..., v p } 


m it 



m 


spans (or generates) 



m 


if every vector in 



m 


is a linear combination of 


Vi, ..., \ p — that is, if Span {vi, ..., v^} 


p 



m 


Let A be an m x n matrix. Then the following statements are logically equivalent. 
That is, for a particular A, either they are all true statements or they are all false. 


a. For each b in W n , the equation Ax = b has a solution. 

b. Each b in M 777 is a linear combination of the columns of 



c. The columns of A span 

d. A has a pivot position in every row 





















38 


CHAPTER 1 


Linear Equations in Linear Algebra 


Theorem 4 is one of the most useful theorems in this chapter. Statements (a), (b), and 
(c) are equivalent because of the definition of Ax and what it means for a set of vectors 
to span M 777 . The discussion after Example 3 suggests why (a) and (d) are equivalent; 
a proof is given at the end of the section. The exercises will provide examples of how 
Theorem 4 is used. 

Warning: Theorem 4 is about a coefficient matrix , not an augmented matrix. If an 
augmented matrix [A b ] has a pivot position in every row, then the equation Ax = b 
may or may not be consistent. 


Computation of Ax 

The calculations in Example 1 were based on the definition of the product of a matrix A 
and a vector x. The following simple example will lead to a more efficient method for 
calculating the entries in Ax when working problems by hand. 
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EXAMPLE 4 Compute Ax, where A = 
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and x = 
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SOLUTION From the definition, 
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X X 
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(7) 


2xi T 3x2 T 4x3 

—X\ + 5X2 “ 3X3 

6xi — 2x2 + 8x3 


The first entry in the product Ax is a sum of products (sometimes called a dot product ), 
using the first row of A and the entries in x. That is, 


2 3 4 


X\ 


2xi T 3x2 T 4x3 



X 2 

— 




_* 3 _ 




This matrix shows how to compute the first entry in Ax directly, without writing down 
all the calculations shown in (7). Similarly, the second entry in Ax can be calculated at 
once by multiplying the entries in the second row of A by the corresponding entries in 
x and then summing the resulting products: 


- 1 

u> 

_1 


Xi 

*2 

— 

— X\ + 5X2 “ 3X3 



_* 3 _ 




Likewise, the third entry in Ax can be calculated from the third row of A and the entries 
in x. ■ 


Row-Vector Rule for Computing Ax 

If the product Ax is defined, then the Ah entry in Ax is the sum of the products of 
corresponding entries from row i of A and from the vector x. 




































1.4 The Matrix Equation Ax 


b 39 


EXAMPLE 5 
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By definition, the matrix in Example 5(c) with 1 ’s on the diagonal and 0’s elsewhere 
is called an identity matrix and is denoted by I . The calculation in part (c) shows that 
lx = x for every x in M 3 . There is an analogous n x n identity matrix, sometimes written 
as I n . As in part (c), I n x = x for every x in M /? . 


Properties of the Matrix-Vector Product Ax 

The facts in the next theorem are important and will be used throughout the text. The 
proof relies on the definition of Ax and the algebraic properties of W 1 . 


THEOREM 5 


If A is an m x n matrix, u and v are vectors in W 1 , and c is a scalar, then: 


a. A( u + v) = Au + Ax; 

b. A(cu) = c(Au). 


PROO! For simplicity, take n = 3, A = [ ai a2 a3 ], and u, v in R 3 . (The proof of 
the general case is similar.) For i = 1,2, 3, let w; and Vj be the zth entries in u and v, 
respectively. To prove statement (a), compute A(u + v) as a linear combination of the 
columns of A using the entries in u + v as weights. 


A(u + v) = [ai a 2 a 3 ] 


u i + Vi 

U 2 + V 2 

u 2 + v 3 


= (u 1 + vi)ai + (u 2 + v 2 )a 2 + (u 3 + v 3 )a 3 


Entries in u + v 


Columns of A 


= (u iai + u 2 a 2 + u 3 a 3 ) + (uiai + v 2 a 2 + p 3 a 3 ) 
= Au + Av 


To prove statement (b), compute A(cxx) as a linear combination of the columns of A 
using the entries in cu as weights. 


A(cu) = [ai a 2 a 3 ] 


CU\ 

cu 2 

cu 3 


= (cu i)ai + (cu 2 )a 2 + (cu 3 ) a 3 


= c(u\a\) + c(u 2 a 2 ) + c(u 3 a 3 ) 


= c(u\a\ + u 2 a 2 + u 3 a 3 ) 


= c(Au) 
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NUMERICAL NOTE - 

To optimize a computer algorithm to compute Ax , the sequence of calculations 
should involve data stored in contiguous memory locations. The most widely 
used professional algorithms for matrix computations are written in Fortran, a 
language that stores a matrix as a set of columns. Such algorithms compute Ax as 
a linear combination of the columns of A. In contrast, if a program is written in 
the popular language C, which stores matrices by rows, Ax should be computed 
via the alternative rule that uses the rows of A . 


PROOF OF THEOREM 4 As was pointed out after Theorem 4, statements (a), (b), and 
(c) are logically equivalent. So, it suffices to show (for an arbitrary matrix A) that (a) 
and (d) are either both true or both false. This will tie all four statements together. 

Let U be an echelon form of A . Given b in R m , we can row reduce the augmented 
matrix [A b ] to an augmented matrix [U d ] for some d in R m : 

[A b]-~[t/ d] 

If statement (d) is true, then each row of U contains a pivot position and there can be no 
pivot in the augmented column. So Ax = b has a solution for any b, and (a) is true. If (d) 
is false, the last row of U is all zeros. Let d be any vector with a 1 in its last entry. Then 
[U d ] represents an inconsistent system. Since row operations are reversible, [U d ] 
can be transformed into the form [A b ] . The new system Ax = b is also inconsistent, 
and (a) is false. ■ 


PRACTICE PROBLEMS 
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1. Let A = 

-3 
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-5 

V = 

,andb = 
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. It can be shown that 


p is a solution of Ax 
of the columns of A . 


b. Use this fact to exhibit b as a specific linear combination 


2. Let A 


2 

3 


5 

1 


u 


4 

1 


, and v 


3 

5 


. Verify Theorem 5(a) in this case 


by computing A(u + v) and Au + Av. 


3 . Construct a 3 x 3 matrix A and vectors b and c in R 3 so that Ax = b has a solution, 
but Ax = c does not. 


1.4 EXERCISES 


Compute the products in Exercises 1-4 using (a) the definition, as 
in Example 1, and (b) the row-vector rule for computing Ax. If a 
product is undefined, explain why. 
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In Exercises 5-8, use the definition of Ax to write the matrix 
equation as a vector equation, or vice versa. 
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A = 

-1 

-1 

-I 

1 

B = 

0 

1 

1 

-5 

X\ 

7 

-5 

0 

— 

0 

0 

-4 

2 

-8 

1 

2 

-3 

7 


-4 


1 


2 


-7 


2 

0 

3 

-1 


-2 

-8 

2 

-1 


8 . z 


i 


4 

2 



-4 


-5 


3 


4 

+ z 2 

5 

+ £3 

4 

+ Z 4 

0 

— 

13 


In Exercises 9 and 10, write the system first as a vector equation 
and then as a matrix equation. 


9. 3xi + x 2 — 5x 3 = 9 

x 2 T 4x 3 — 0 


10. 8x1 — x 2 = 4 

5 x\ + 4x 2 = 1 

x\ — 3x 2 = 2 


Given A and b in Exercises 11 and 12, write the augmented matrix 
for the linear system that corresponds to the matrix equation 
Ax = b. Then solve the system and write the solution as a vector. 




1 

2 

4" 


~-2“ 

11. 

A = 

0 

1 

5 

,b = 

2 



_ —2 

-4 

—3 _ 


9 _ 



1 

2 

1 " 


0 " 

12. 

A = 

-3 

-1 

2 

,b = 

1 



0 

5 

3 


-1 


13. Let u = 


" 0 " 


3 

-5" 

4 

and A = 

-2 

6 

4 


1 

1 


. Is u in the plane E 3 


spanned by the columns of AI (See the figure.) Why or why 
not? 



Plane spanned by 
the columns of A 



2" 


"5 

8 

7" 

14. Letu = 

-3 

and A = 

0 

1 

-1 


2 


1 

3 

0 


. Is u in the subset 


of E 3 spanned by the columns of AI Why or why not? 


15. Let A = 


2 -1 
6 3 


andb = 


b\ 

b 2 


. Show that the equation 


Ax = b does not have a solution for all possible b, and 
describe the set of all b for which Ax = b does have a 
solution. 


16. Repeat Exercise 15: A = 


1 

-3 

-4" 


b\ 

-3 

2 

6 

,b = 

b 2 

5 

-1 

-8 


_ b 2 _ 


Exercises 17-20 refer to the matrices A and B below. Make 
appropriate calculations that justify your answers and mention an 
appropriate theorem. 


17. How many rows of A contain a pivot position? Does the 
equation Ax = b have a solution for each b in E 4 ? 

18. Do the columns of B span E 4 ? Does the equation Bx = y 
have a solution for each y in E 4 ? 

19. 


Can each vector in E 4 be written as a linear combination of 
the columns of the matrix A above? Do the columns of A 
span E 4 ? 


20 . 


Can every vector in E 4 be written as a linear combination of 
the columns of the matrix B above? Do the columns of B 
span E 3 ? 


21. Let Vi = 


1 “ 


0 “ 


1 “ 

0 


-1 


0 

-1 

, v 2 = 

0 

, V 3 = 

0 

0 


1 


-1 


Does {vi, v 2 , v 3 } span E 4 ? Why or why not? 


22. Let Vi = 


0 


0" 


4" 

1 

K> O 

1_ 

, V 2 = 

-3 

8 

, v 3 = 

-1 

-5 


Does {vi,v 2 ,v 3 } span R 3 ? Why or why not? 

In Exercises 23 and 24, mark each statement True or False. Justify 
each answer. 

23. a. The equation Ax = b is referred to as a vector equation. 

b. A vector b is a linear combination of the columns of a 
matrix A if and only if the equation Ax = b has at least 
one solution. 

c. The equation Ax = b is consistent if the augmented ma¬ 
trix [A b ] has a pivot position in every row. 

d. The first entry in the product Ax is a sum of products. 


e. If the columns of an m x n matrix A span 
equation Ax = b is consistent for each b in 



m 


, then the 



m 


f. If A is an m x n matrix and if the equation Ax = b is 
inconsistent for some b in E m , then A cannot have a pivot 
position in every row. 

24 . a. Every matrix equation Ax = b corresponds to a vector 

equation with the same solution set. 

b. Any linear combination of vectors can always be written 
in the form Ax for a suitable matrix A and vector x. 

c. The solution set of a linear system whose augmented 
matrix is [ ai a 2 a 3 b ] is the same as the solution 
set of Ax = b, if A = [ ai a 2 a 3 ]. 

d. If the equation Ax = b is inconsistent, then b is not in the 
set spanned by the columns of A . 

e. If the augmented matrix [A b ] has a pivot position in 
every row, then the equation Ax = b is inconsistent. 
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27 


28 


29 


30 


31 



f. If A is an m x n matrix whose columns do not span 
then the equation Ax = b is inconsistent for some b in 


R 


m 


25. Note that 


4 -3 1 

5-2 5 

-6 2 -3 


1 

1 

_i 


-7 


-i 

— 

-3 

1 

i 

<N 

_1 


i 

© 

_ i 


. Use this fact 


(and no row operations) to find scalars c i, c 2 , c 3 such that 


-7 


4 


-3 


1 

-3 

= Ci 

5 

+ c 2 

-2 

+ c 3 

5 

10 


-6 


2 


-3 


26. Let u = 


"7" 


"3" 


"6" 

2 

, v = 

1 

, and w = 

1 

5 


3 


0 


It can be shown that 3u — 5v — w = 0. Use this fact (and 
no row operations) to find X\ and x 2 that satisfy the equation 


"7 

_1 

_ _ 


1 

^0 

1 _ 

2 

5 

1 _ 

Xl 

_ X 2_ 

— 

1 

_0_ 


Let qi, q 2 , q 3 , and v represent vectors in R 5 , and let x\ , x 2 , 
and x 3 denote scalars. Write the following vector equation as 
a matrix equation. Identify any symbols you choose to use. 

uqi + * 2 q 2 + X 3 q 3 = v 

Rewrite the (numerical) matrix equation below in symbolic 
form as a vector equation, using symbols Vi, v 2 ,... for the 
vectors and C\, c 2 ,... for scalars. Define what each symbol 
represents, using the data given in the matrix equation. 


3 

5 


5-497 
8 1-2-4 


-3 

2 

4 

-1 

2 


8 

-1 


Construct a 3 x 3 matrix, not in echelon form, whose 
columns span R 3 . Show that the matrix you construct has the 
desired property. 

Construct a 3 x 3 matrix, not in echelon form, whose 
columns do not span IR 3 . Show that the matrix you construct 
has the desired property. 

Let A be a 3 x 2 matrix. Explain why the equation Ax = b 
cannot be consistent for all b in IR 3 . Generalize your 


argument to the case of an arbitrary A with more rows than 
columns. 


32. Could a set of three vectors in R 4 span all of R 4 ? Explain. 
What about n vectors in R m when n is less than ml 

33. Suppose A is a 4 x 3 matrix and b is a vector in R 4 with the 
property that Ax = b has a unique solution. What can you say 
about the reduced echelon form of A? Justify your answer. 

34. Suppose A is a 3 x 3 matrix and b is a vector in R 3 with the 
property that Ax = b has a unique solution. Explain why the 
columns of A must span R 3 . 

35. 


Let A be a 3 x 4 matrix, let jq and y 2 be vectors in R 3 , and 
let w = y { + y 2 . Suppose y { = Ax i and y 2 = Ax 2 for some 
vectors Xi and x 2 in R 4 . What fact allows you to conclude that 
the system Ax = w is consistent? {Note: X\ and x 2 denote 
vectors, not scalar entries in vectors.) 


36. 


Let A be a 5 x 3 matrix, let y be a vector in R 3 , and let z 
be a vector in R 5 . Suppose Ay = z. What fact allows you to 
conclude that the system Ax = 4z is consistent? 


[M] In Exercises 37-40, determine if the columns of the matrix 
span R 4 . 


37. 


7 

5 

6 
7 


2 

-3 

10 

9 


-5 

4 

-2 

2 


8 

-9 

7 

15 


39. 


1— 

5 

-7 

-4 

i 

o> 

38. 

6 

-8 

-7 

5 

4 

-4 

-9 

-9 


-9 

11 

16 

7 

L 

J 






40. 


12 

-7 

11 

-9 

5 

-9 

4 

-8 

7 

-3 

-6 

11 

-7 

3 

-9 

4 

-6 

10 

-5 

12 

OO 

11 

-6 

-7 

13" 

-7 

-8 

5 

6 

-9 

11 

7 

-7 

-9 

-6 

-3 

4 

1 

8 

7 


41. 


[M] Find a column of the matrix in Exercise 39 that can be 
deleted and yet have the remaining matrix columns still span 



42. 


[M] Find a column of the matrix in Exercise 40 that can be 
deleted and yet have the remaining matrix columns still span 
4 . Can you delete more than one column? 




Mastering Linear Algebra Concepts: Span 1-18 



SOLUTIONS TO PRACTICE PROBLEMS 
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1. The matrix equation 
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is equivalent to the vector equation 



1 


5 


-2 


0 


-7 

3 

-3 

-2 

1 

+ 0 

9 

-4 

-5 

— 

9 


4 


-8 


-1 


7 


0 



which expresses b as a linear combination of the columns of A 


u + v 


A(u + v) 


Au + A\ 


4 

1 


+ 


3 

5 


1 

4 


2 

3 

2 

3 

3 

11 


5 

1 

5 

1 


1 

4 



' 2 + 20' 


"22" 


3 + 4 


7 


4 

1 


+ 


+ 


19 

-4 


2 

3 

22 

7 


5 

1 


3 

5 


Remark: There are, in fact, infinitely many correct solutions to Practice Problem 3. 
When creating matrices to satisfy specified criteria, it is often useful to create 
matrices that are straightforward, such as those already in reduced echelon form. 
Here is one possible solution: 

3. Let 



"1 

0 

1" 


3 


3 

A = 

0 

1 

1 

, b = 

2 

, and c = 

2 


0 

0 

0 


0 


1 


Notice the reduced echelon form of the augmented matrix corresponding to Ax = b 
is 

"10 13 " 

0 112 , 

_0 0 0 0 _ 

which corresponds to a consistent system, and hence Ax = b has solutions. The 
reduced echelon form of the augmented matrix corresponding to Ax = c is 


10 13 
0 112 
0 0 0 1 


which corresponds to an inconsistent system, and hence Ax = c does not have any 
solutions. 


1.5 SOLUTION SETS OF LINEAR SYSTEMS 


Solution sets of linear systems are important objects of study in linear algebra. They 
will appear later in several different contexts. This section uses vector notation to give 
explicit and geometric descriptions of such solution sets. 

Homogeneous Linear Systems 

A system of linear equations is said to be homogeneous if it can be written in the form 
Ax = 0, where A is an m x n matrix and 0 is the zero vector in M 7;? . Such a system 
Ax = 0 always has at least one solution, namely, x = 0 (the zero vector in W l ). This 
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zero solution is usually called the trivial solution. For a given equation Ax = 0, the 
important question is whether there exists a nontrivial solution, that is, a nonzero 
vector x that satisfies Ax = 0. The Existence and Uniqueness Theorem in Section 1.2 
(Theorem 2) leads immediately to the following fact. 


The homogeneous equation Ax = 0 has a nontrivial solution if and only if the 
equation has at least one free variable. 


EXAMPLE 1 Determine if the following homogeneous system has a nontrivial 
solution. Then describe the solution set. 


3x\ + 5x2 — 4x3 = 0 
—3xi — 2x2 + 4x3 = 0 

6xi + X2 — 8x3 = 0 

SOLUTION Let A be the matrix of coefficients of the system and row reduce the 
augmented matrix [A 0 ] to echelon form: 


— 1 

5 

-4 

0 " 


"3 

5 

-4 

0 " 


"3 

5 

-4 

0 " 

-3 

-2 

4 

0 


0 

3 

0 

0 


0 

3 

0 

0 

^0 

_ 1 

1 

-8 

0 


0 

-9 

0 

0 


0 

0 

0 

0 


Since X 3 is a free variable, Ax = 0 has nontrivial solutions (one for each choice of X 3 ). 
To describe the solution set, continue the row reduction of [ A 0 ] to reduced echelon 
form: 

"l 0 -f 0 

0 10 0 

0 0 0 0 




FIGURE 1 


Solve for the basic variables X\ and X 2 and obtain x\ = |x 3 , X 2 = 0, with X 3 free. As a 
vector, the general solution of Ax = 0 has the form 


x = 

X\ 

x 2 

— 

■|x 3 " 

0 

= X 3 

- 4 - 

3 

0 

= X3V, where v = 

- 4 - 

3 

0 


_ X 3 _ 


_ *3 


_ 1 


_ 1 


Here X 3 is factored out of the expression for the general solution vector. This shows that 
every solution of Ax = 0 in this case is a scalar multiple of v. The trivial solution is 
obtained by choosing X 3 = 0. Geometrically, the solution set is a line through 0 in E 3 . 
See Figure 1. ■ 


Notice that a nontrivial solution x can have some zero entries so long as not all of 
its entries are zero. 


EXAMPLE 2 A single linear equation can be treated as a very simple system of 
equations. Describe all solutions of the homogeneous “system” 

10 xi — 3 x 2 — 2 x 3 = 0 ( 1 ) 

SOLUTION There is no need for matrix notation. Solve for the basic variable X\ in 
terms of the free variables. The general solution is x\ = . 3 x 2 + . 2 x 3 , with X 2 and X 3 
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free. As a vector, the general solution is 



FIGURE 2 



X\ 


.3X2 + -2X3 


.3X2 


.2x 3 

X = 

X 2 

— 

*2 

— 

*2 

+ 

0 


_* 3 _ 


*3 


0 


*3 _ 


*2 


.3 

1— 

1 

+ X 3 

0 



.2 

0 

1 


(with X 2 , X 3 free) 


1 

u 


I 

V 



This calculation shows that every solution of (1) is a linear combination of the vectors 
u and v, shown in (2). That is, the solution set is Span {u, v}. Since neither u nor v is a 
scalar multiple of the other, the solution set is a plane through the origin. See Figure 2. 


Examples 1 and 2, along with the exercises, illustrate the fact that the solution 
set of a homogeneous equation Ax = 0 can always be expressed explicitly as 
Span {vi,..., v^} for suitable vectors Vi,..., v^. If the only solution is the zero vector, 
then the solution set is Span {0}. If the equation Ax = 0 has only one free variable, the 
solution set is a line through the origin, as in Figure 1. A plane through the origin, as in 
Figure 2, provides a good mental image for the solution set of Ax = 0 when there are 
two or more free variables. Note, however, that a similar figure can be used to visualize 
Span{u, v} even when u and v do not arise as solutions of Ax = 0. See Figure 11 in 
Section 1.3. 


Parametric Vector Form 

The original equation (1) for the plane in Example 2 is an implicit description of the 
plane. Solving this equation amounts to finding an explicit description of the plane as 
the set spanned by u and v. Equation (2) is called a parametric vector equation of the 
plane. Sometimes such an equation is written as 

x = su + tv (, s , t in R) 

to emphasize that the parameters vary over all real numbers. In Example 1, the equation 
x = X 3 V (with X 3 free), or x = tv (with t in R), is a parametric vector equation of a line. 
Whenever a solution set is described explicitly with vectors as in Examples 1 and 2, we 
say that the solution is in parametric vector form. 


Solutions of Nonhomogeneous Systems 

When a nonhomogeneous linear system has many solutions, the general solution can be 
written in parametric vector form as one vector plus an arbitrary linear combination of 
vectors that satisfy the corresponding homogeneous system. 

EXAMPLE 3 Describe all solutions of Ax = b, where 



3 

5 

-4" 


7 

A = 

-3 

-2 

4 

and b = 

-1 


6 

1 

-8 


-4 
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SOLUTION Here A is the matrix of coefficients from Example 1. Row operations on 
[A b ] produce 


3 

5 

-4 

7 


1 

0 

4 

3 

-1 

4 

Xi - 3X3 = 

-1 

-3 

-2 

4 

-1 


0 

1 

0 

2 

, X 2 = 

2 

6 

1 

-8 

-4 


0 

0 

0 

0 

0 = 

0 


Thusxi = — 1 + |x 3 ,X 2 = 2, and X 3 is free. As a vector, the general solution of Ax = b 
has the form 



X \ 


— 1 + 1^3 


~-l" 


3X 3 


~-l" 


1 — 

^|m 

1 _ 

x = 

X 2 

— 

2 

— 

2 

+ 

0 

— 

2 

+ X3 

0 


_X 3 _ 


X3 


0_ 


_ X 3 


0_ 


_ 1 


t t 

P V 

The equation x = p + X3V, or, writing t as a general parameter, 

x = p + t\ (t in R) (3) 



FIGURE 3 

Adding p to v translates v to v + p. 


describes the solution set of Ax = b in parametric vector form. Recall from Example 1 
that the solution set of Ax = 0 has the parametric vector equation 

x = tx ( t in R) (4) 

[with the same v that appears in (3)]. Thus the solutions of Ax = b are obtained by 
adding the vector p to the solutions of Ax = 0. The vector p itself is just one particular 
solution of Ax = b [corresponding to t = 0 in (3)]. ■ 



FIGURE 4 

Translated line. 


To describe the solution set of Ax = b geometrically, we can think of vector 
addition as a translation. Given v and p in R 2 or R 3 , the effect of adding p to v is 
to move v in a direction parallel to the line through p and 0. We say that v is translated 
by p to v + p. See Figure 3. If each point on a line L in R 2 or R 3 is translated by a 
vector p, the result is a line parallel to L. See Figure 4. 

Suppose L is the line through 0 and v, described by equation (4). Adding p to each 
point on L produces the translated line described by equation (3). Note that p is on the 
line in equation (3). We call (3) the equation of the line through p parallel to v. Thus 
the solution set of Ax = b is a line through p parallel to the solution set of Ax = 0. 
Figure 5 illustrates this case. 



FIGURE 5 Parallel solution sets of Ax = b and 
Ax = 0. 


The relation between the solution sets of Ax = b and Ax = 0 shown in Figure 5 
generalizes to any consistent equation Ax = b, although the solution set will be larger 
than a line when there are several free variables. The following theorem gives the precise 
statement. See Exercise 25 for a proof. 
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THEOREM 6 


Suppose the equation Ax = b is consistent for some given b, and let p be a 
solution. Then the solution set of Ax = b is the set of all vectors of the form 
w = p + \h, where v/ 2 is any solution of the homogeneous equation Ax = 0. 


Theorem 6 says that if Ax = b has a solution, then the solution set is obtained by 
translating the solution set of Ax = 0, using any particular solution p of Ax = b for 
the translation. Figure 6 illustrates the case in which there are two free variables. Even 
when n > 3, our mental image of the solution set of a consistent system Ax = b (with 
b ^ 0) is either a single nonzero point or a line or plane not passing through the origin. 



FIGURE 6 Parallel solution sets of 
Ax = b and Ax = 0. 


Warning: Theorem 6 and Figure 6 apply only to an equation Ax = b that has at least 
one nonzero solution p. When Ax = b has no solution, the solution set is empty. 

The following algorithm outlines the calculations shown in Examples 1,2, and 3. 

WRITING A SOLUTION SET (OF A CONSISTENT SYSTEM) IN PARAMETRIC 

VECTOR FORM 

1. Row reduce the augmented matrix to reduced echelon form. 

2. Express each basic variable in terms of any free variables appearing in an 
equation. 

3. Write a typical solution x as a vector whose entries depend on the free 
variables, if any. 

4. Decompose x into a linear combination of vectors (with numeric entries) using 
the free variables as parameters. 


PRACTICE PROBLEMS 


1. Each of the following equations determines a plane in M 1 2 3 . Do the two planes 
intersect? If so, describe their intersection. 


X\ + 4X2 “ 5X3 = 0 

2 xi — X2 + 8x3 = 9 

2. Write the general solution of 10xi — 3x2 — 2 x 3 = 7 in parametric vector form, and 
relate the solution set to the one found in Example 2. 

3. Prove the first part of Theorem 6: Suppose that p is a solution of v4x = b, so that 
4p = b. Let v h be any solution to the homogeneous equation v4x = 0, and let 
w = p + v /7 . Show that w is a solution to v 4 x = b. 










48 CHAPTER 1 Linear Equations in Linear Algebra 


1.5 EXERCISES 


In Exercises 1-4, determine if the system has a nontrivial solution. 
Try to use as few row operations as possible. 


1. 2xi — 5x 2 + 8x 3 = 0 
—2xi — 7x 2 + x 3 = 0 
Ax[ T 2x 2 7x 3 — 0 


2. x\ — 3x 2 + 7x 3 = 0 
—2xi + x 2 — 4x 3 = 0 
X\ + 2x 2 + 9x 3 — 0 


3. — 3 xi + 5 x 2 — 7 x 3 = 0 4. — 5 xi + 7 x 2 + 9 x 3 = 0 

—6x1 + 7x 2 + x 3 — 0 X\ — 2x 2 + 6x 3 — 0 

In Exercises 5 and 6, follow the method of Examples 1 and 2 
to write the solution set of the given homogeneous system in 
parametric vector form. 

5. x\ + 3x 2 + x 3 = 0 6 . X\ T 3x 2 — 5x 3 = 0 

—4xi — 9 x 2 + 2x 3 = 0 X\ T 4x 2 — 8x 3 = 0 

— 3x? — 6 x 3 = 0 — 3xi — 7 x 2 + 9x 3 = 0 


In Exercises 7-12, describe all solutions of Ax = 0 in parametric 
vector form, where A is row equivalent to the given matrix. 


7. 

" 1 

3 

-3 

7" 


8. 


" 1 

-2 

-9 

5 

_0 

1 

-4 

5 _ 



_0 

1 

2 

-6 

9. 

3 

-9 

6" 



10. 


" 1 

3 

0 

-4 

-8 

-1 

3 

-2 




2 

6 

0 


" 1 

-4 

-2 

0 

3 

-5" 





11. 

0 

0 

1 

0 

0 

-1 





0 

0 

0 

0 

1 

-4 






_0 

0 

0 

0 

0 

0 _ 






“ 1 

5 

2 

-6 

9 

0 " 





12. 

0 

0 

1 

-7 

4 

-8 





0 

0 

0 

0 

0 

1 






_0 

0 

0 

0 

0 

0 _ 






13. Suppose the solution set of a certain system of linear equa¬ 
tions can be described as x\ = 5 + 4x 3 ,x 2 — —2 — 7x 3 ,with 
x 3 free. Use vectors to describe this set as a line in M 3 . 

14. Suppose the solution set of a certain system of linear 
equations can be described as x\ = 3 x 4 , x 2 = 8 + x 4 , 
x 3 — 2 — 5 x 4 , with x 4 free. Use vectors to describe this set 
as a “line” in IR 4 . 

15. Follow the method of Example 3 to describe the solutions of 
the following system in parametric vector form. Also, give 
a geometric description of the solution set and compare it to 
that in Exercise 5. 

X\ -\- 3 x 2 -j- x 3 — 1 

— 4xi — 9x 2 + 2x 3 = — 1 
— 3x 2 — 6 x 3 = —3 

16. As in Exercise 15, describe the solutions of the following 
system in parametric vector form, and provide a geometric 
comparison with the solution set in Exercise 6. 


X \ + 3x 2 — 5x 3 = 4 

X\ + 4x 2 — 8 x 3 = 7 

—3xi — 7 x 2 + 9x 3 = —6 

17. Describe and compare the solution sets of x\ + 9x 2 — 4x 3 = 0 
and x\ + 9x 2 — 4x 3 = —2. 

18. Describe and compare the solution sets of Xi — 3x 2 + 5x 3 = 0 
and x\ — 3x 2 + 5x 3 = 4. 


In Exercises 19 and 20, find the parametric equation of the line 
through a parallel to b . 


19. a = 



20. a = 




In Exercises 21 and 22, find a parametric equation of the line M 
through p and q. [Hint: M is parallel to the vector q — p. See the 
figure below.] 


21. p = 


2 

5 


>q = 


3 

1 


22. p = 


6 

3 


>q = 


0 

-4 


X 


2 



X 


1 


In Exercises 23 and 24, mark each statement True or False. Justify 
each answer. 


23. a. A homogeneous equation is always consistent. 

b. The equation Ax = 0 gives an explicit description of its 
solution set. 

c. The homogeneous equation Ax = 0 has the trivial so¬ 
lution if and only if the equation has at least one free 
variable. 

d. The equation x = p + t\ describes a line through v par¬ 
allel to p . 

e. The solution set of Ax = b is the set of all vectors of 
the form w = p + V/,, where V/ 7 is any solution of the 
equation Ax = 0. 

24. a. If x is a nontrivial solution of Ax = 0, then every entry in 

x is nonzero. 

b. The equation x = x 2 u + x 3 v, with x 2 and x 3 free (and 
neither u nor v a multiple of the other), describes a plane 
through the origin. 

c. The equation Ax = b is homogeneous if the zero vector 
is a solution. 

d. The effect of adding p to a vector is to move the vector in 
a direction parallel to p. 
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e. The solution set of Ax = b is obtained by translating the 
solution set of Ax = 0. 

25. Prove the second part of Theorem 6: Let w be any solution of 
Ax = b, and define V/, = w — p. Show that v/, is a solution 
of Ax = 0. This shows that every solution of Ax = b has the 
form w = p + v/j., with p a particular solution of Ax = b and 
\h a solution of Ax = 0. 

26. Suppose Ax = b has a solution. Explain why the solution is 
unique precisely when Ax = 0 has only the trivial solution. 

27. Suppose A is the 3x3 zero matrix (with all zero entries). 
Describe the solution set of the equation Ax = 0. 


34. Given A = 




find one nontrivial solution of 




Ax = 0 by inspection. 


Construct a 3 x 3 nonzero 


1 

1 

1 


is a solution of Ax = 


Construct a 3 x 3 nonzero 


1 

-2 

1 


is a solution of Ax 


matrix A such that the vector 

0 . 

matrix A such that the vector 
= 0. 


28. If b 7 ^ 0, can the solution set of Ax = b be a plane through 
the origin? Explain. 


In Exercises 29-32, (a) does the equation Ax = 0 have a nontriv¬ 
ial solution and (b) does the equation Ax = b have at least one 
solution for every possible b? 


29. A is a 3 x 3 matrix with three pivot positions. 

30. A is a 3 x 3 matrix with two pivot positions. 

31. A is a 3 x 2 matrix with two pivot positions. 

32. A is a 2 x 4 matrix with two pivot positions. 


33. Given A = 



find one nontrivial solution of 


Ax = 0 by inspection. [Hint: Think of the equation Ax = 0 
written as a vector equation.] 


37. Construct a 2 x 2 matrix A such that the solution set of the 
equation Ax = 0 is the line in R 2 through (4,1) and the 
origin. Then, find a vector b in R 2 such that the solution set 
of Ax = b is not a line in R 2 parallel to the solution set of 
Ax = 0. Why does this not contradict Theorem 6? 

38. Suppose A is a 3 x 3 matrix and y is a vector in R 3 such that 
the equation Ax = y does not have a solution. Does there 
exist a vector z in R 3 such that the equation Ax = z has a 
unique solution? Discuss. 

39. Let A be an m x n matrix and let u be a vector in R n that satis¬ 
fies the equation Ax = 0. Show that for any scalar c , the vec¬ 
tor c u also satisfies Ax = 0. [That is, show that A{cxx) = 0.] 

40. Let A be an m x n matrix, and let u and v be vectors in R ' 7 
with the property that A\x = 0 and Ay = 0. Explain why 
v4(u + v) must be the zero vector. Then explain why 
A{cxx + d\) = 0 for each pair of scalars c and d . 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Row reduce the augmented matrix: 


"1 

4 

-5 

0 " 


"1 

4 

—5 

0" 


"1 

0 

3 

1- 

<N 

_1 

-1 

00 

1 _ 


0 

-9 

18 

1_ 


0 

1 

-2 

-1 


x\ + 3x3 = 4 

X 2 — 2X3 = — 1 

Thus X\ — 4 — 3 x 3 , *2 = — 1 + 2 x 3 , with X 3 free. The general solution in parametric 
vector form is 


X\ 


4 — 3 x 3 


4" 


-3 

X 2 

— 

—1 T - 2 x 3 

— 

-1 

+ X3 

2 

X 3 _ 


X 3 


0 


1 


t t 

P V 


The intersection of the two planes is the line through p in the direction of v. 
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2. The augmented matrix [ 10 —3 —2 7 ] is row equivalent to [ 1 —.3 —.2 .7], 

and the general solution is X\ = .7 + . 3 x 2 + - 2 x 3 , with X 2 and X 3 free. That is, 



X\ 


.7 ~\~ .3x2 T - * 2 x 3 


.7 


.3 


.2 

X = 

x 2 

— 

x 2 

— 

0 

+ X2 

1 

+ X3 

0 


_-Y 3 _ 


X3 


0 


0 


1 



— 

P 

+ 

x 2 u 

+ 

x 3 v 


The solution set of the nonhomogeneous equation Ax = b is the translated plane 
p + Span{u, v}, which passes through p and is parallel to the solution set of the 
homogeneous equation in Example 2. 

3. Using Theorem 5 from Section 1.4, notice 

A(p + \h) = Ap + A\ h = b + 0 = b, 

hence p + v/* is a solution to Ax = b. 


1.6 APPLICATIONS OF LINEAR SYSTEMS 


You might expect that a real-life problem involving linear algebra would have only 
one solution, or perhaps no solution. The purpose of this section is to show how linear 
systems with many solutions can arise naturally. The applications here come from 
economics, chemistry, and network flow. 


A Homogeneous System in Economics 


WEB 


The system of 500 equations in 500 variables, mentioned in this chapter’s introduction, 
is now known as a Leontief “input-output” (or “production”) model . 1 Section 2.6 will 
examine this model in more detail, when more theory and better notation are available. 
For now, we look at a simpler “exchange model,” also due to Leontief. 

Suppose a nation’s economy is divided into many sectors, such as various manufac¬ 
turing, communication, entertainment, and service industries. Suppose that for each sec¬ 
tor we know its total output for one year and we know exactly how this output is divided 
or “exchanged” among the other sectors of the economy. Let the total dollar value of a 
sector’s output be called the price of that output. Leontief proved the following result. 


There exist equilibrium prices that can be assigned to the total outputs of the 
various sectors in such a way that the income of each sector exactly balances its 
expenses. 

The following example shows how to find the equilibrium prices. 


EXAMPLE 1 Suppose an economy consists of the Coal, Electric (power), and Steel 
sectors, and the output of each sector is distributed among the various sectors as shown 
in Table 1, where the entries in a column represent the fractional parts of a sector’s total 
output. 

The second column of Table 1, for instance, says that the total output of the 
Electric sector is divided as follows: 40% to Coal, 50% to Steel, and the remaining 
10% to Electric. (Electric treats this 10% as an expense it incurs in order to operate its 


1 


See Wassily W. Leontief, “Input-Output Economics,” Scientific American , October 1951, pp. 15-21. 
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business.) Since all output must be taken into account, the decimal fractions in each 
column must sum to 1. 

Denote the prices (i.e., dollar values) of the total annual outputs of the Coal, 
Electric, and Steel sectors by pc, Pe, and p$ , respectively. If possible, find equilibrium 
prices that make each sector’s income match its expenditures. 



TABLE 1 A Simple Economy 


Distribution of Output from: 

Coal 

Electric 

Steel 

Purchased by: 

.0 

.4 

.6 

Coal 

.6 

.1 

.2 

Electric 

.4 

.5 

.2 

Steel 


.4 

SOLUTION A sector looks down a column to see where its output goes, and it looks 
across a row to see what it needs as inputs. For instance, the first row of Table 1 says 
that Coal receives (and pays for) 40% of the Electric output and 60% of the Steel 
output. Since the respective values of the total outputs are p E and p$, Coal must spend 
Apt dollars for its share of Electric’s output and .6p$ for its share of Steel’s output. 
Thus Coal’s total expenses are Ap E + . 6ps . To make Coal’s income, pc , equal to its 
expenses, we want 

Pc = Ap E + -6/? s (1) 


The second row of the exchange table shows that the Electric sector spends .6pc 
for coal, .lp E for electricity, and .2 p$ for steel. Hence the income/expense requirement 
for Electric is 

Pe = -6pc + -lp E + -2ps (2) 


Finally, the third row of the exchange table leads to the final requirement: 


Ps = Ape + • 5Pe + -2/> s (3) 

To solve the system of equations (1), (2), and (3), move all the unknowns to the left 
sides of the equations and combine like terms. [For instance, on the left side of (2), 

write Pe — Ap E as .9p E -] 


Pc ~ Ape ~ -6ps = 0 
~.6p c + -9p E - .2p s = 0 
—Ape - -5p E + .8/? s = 0 

Row reduction is next. For simplicity here, decimals are rounded to two places. 


1 

-.4 ■ 

-.6 

0 " 


"1 

-.4 

—.6 

0 



"1 

-.4 

—.6 


0 " 

—.6 

.9 

-.2 

0 


0 

.66 

-.56 

0 



0 

.66 

-.56 


0 

-.4 ■ 

-.5 

.8 

0 


0 

-.66 

.56 

0 



0 

0 

0 


0 






“1 

-.4 

—.6 

0 " 




1 

0 - 

.94 

0 








0 

1 - 

-.85 

0 



0 

1 - 

.85 

0 








0 

0 

0 

0 



0 

0 

0 

0 
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The general solution is pc = -94p$, Pe = .85/>s, and /?s is free. The equilibrium price 
vector for the economy has the form 


p = 

Pc 

Pe 

— 

,94p s 

•85/? s 

= Ps 

.94 

.85 


_Ps_ 


Ps 


1 


Any (nonnegative) choice for p$ results in a choice of equilibrium prices. For instance, 
if we take p$ to be 100 (or $100 million), then p c = 94 and Pe = 85. The incomes and 
expenditures of each sector will be equal if the output of Coal is priced at $94 million, 
that of Electric at $85 million, and that of Steel at $100 million. ■ 

Balancing Chemical Equations 

Chemical equations describe the quantities of substances consumed and produced by 
chemical reactions. For instance, when propane gas burns, the propane (C 3 H 8 ) combines 
with oxygen (O 2 ) to form carbon dioxide (CO 2 ) and water (H 2 0 ), according to an 
equation of the form 

(Xi)C 3 H 8 + (^ 2)02 —> (X3)C02 + (X4)H20 (4) 

To “balance” this equation, a chemist must find whole numbers X \,..., X 4 such that the 
total numbers of carbon (C), hydrogen (H), and oxygen (O) atoms on the left match the 
corresponding numbers of atoms on the right (because atoms are neither destroyed nor 
created in the reaction). 

A systematic method for balancing chemical equations is to set up a vector equation 
that describes the numbers of atoms of each type present in a reaction. Since equation 
(4) involves three types of atoms (carbon, hydrogen, and oxygen), construct a vector in 
M 3 for each reactant and product in (4) that lists the numbers of “atoms per molecule,” 
as follows: 



3 


0 


1 


0 

Carbon 

C 3 H 8 : 

8 

> O 2 : 

0 

, C0 2 : 

0 

, H 2 0: 

2 

Hydrogen 


0 


2 


2 


1 

Oxygen 


To balance equation (4), the coefficients X \,..., X 4 must satisfy 



3 


0 


1 


0 

X\ 

8 

+ X2 

0 

= *3 

0 

+ X4 

2 


0 


2 


2 


1 


To solve, move all the terms to the left (changing the signs in the third and fourth 
vectors): 



3 


0 


-1 


0 


0 

X\ 

8 

+ X2 

0 

+ X3 

0 

+ X4 

-2 

— 

0 


0 


2 


-2 


-1 


0 


Row reduction of the augmented matrix for this equation leads to the general solution 

1 r o 

x\ = |x 4 , X 2 = |V 4 , X 3 = |X 4 , with X 4 free 

Since the coefficients in a chemical equation must be integers, take X 4 = 4, in which 
case X\ = 1, X 2 = 5, and X 3 = 3. The balanced equation is 

C 3 H 8 + 50 2 -* 3C0 2 + 4H 2 0 

The equation would also be balanced if, for example, each coefficient were doubled. For 
most purposes, however, chemists prefer to use a balanced equation whose coefficients 
are the smallest possible whole numbers. 
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Network Flow 


WEB 


- > *1 

30 —>-- 

- 


FIGURE 1 

A junction, or node. 


Systems of linear equations arise naturally when scientists, engineers, or economists 
study the flow of some quantity through a network. For instance, urban planners and 
traffic engineers monitor the pattern of traffic flow in a grid of city streets. Electrical 
engineers calculate current flow through electrical circuits. And economists analyze the 
distribution of products from manufacturers to consumers through a network of whole¬ 
salers and retailers. For many networks, the systems of equations involve hundreds or 
even thousands of variables and equations. 

A network consists of a set of points called junctions , or nodes , with lines or arcs 
called branches connecting some or all of the junctions. The direction of flow in each 
branch is indicated, and the flow amount (or rate) is either shown or is denoted by a 
variable. 

The basic assumption of network flow is that the total flow into the network equals 
the total flow out of the network and that the total flow into a junction equals the total 
flow out of the junction. For example, Figure 1 shows 30 units flowing into a junction 
through one branch, with x\ and x 2 denoting the flows out of the junction through other 
branches. Since the flow is “conserved” at each junction, we must have x\ + X 2 = 30. 
In a similar fashion, the flow at each junction is described by a linear equation. The 
problem of network analysis is to determine the flow in each branch when partial 
information (such as the flow into and out of the network) is known. 


EXAMPLE 2 The network in Figure 2 shows the traffic flow (in vehicles per hour) 
over several one-way streets in downtown Baltimore during a typical early afternoon. 
Determine the general flow pattern for the network. 


Calvert St. 


*3 

A 


100 

South St. ,r 


300 -4 


Lombard St. 







300 


Pratt St. 


A 


t 

500 



D 

— 

Inner Harbor 


A- 600 



FIGURE 2 Baltimore streets. 


SOLUTION Write equations that describe the flow, and then find the general solution 
of the system. Label the street intersections (junctions) and the unknown flows in the 
branches, as shown in Figure 2. At each intersection, set the flow in equal to the flow out. 


Intersection 

Flow in 

Flow out 

A 

300 + 500 

= X{ + X 2 

B 

x 2 + x 4 

— 300 + X3 

C 

100 + 400 

— x 4 -j- X5 

D 

*1 + X5 

= 600 
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Also, the total flow into the network (500 + 300 + 100 + 400) equals the total flow 
out of the network (300 + X 3 + 600), which simplifies to X 3 = 400. Combine this 
equation with a rearrangement of the first four equations to obtain the following system 
of equations: 


X\ -f X 2 = 800 

X 2 — X 3 + X 4 = 300 

X 4 + X 5 = 500 

x\ + X 5 = 600 

X 3 = 400 


Row reduction of the associated augmented matrix leads to 


x\ + X 5 = 600 

X 2 — X 5 = 200 

X 3 = 400 

X 4 + X 5 = 500 


The general flow pattern for the network is described by 




X\ - 

= 600- 

-*5 

X 2 = 

= 200 + X 5 

*3 = 

= 400 


X4 - 

= 500- 

- X 5 

Xs is free 



A negative flow in a network branch corresponds to flow in the direction opposite 
to that shown on the model. Since the streets in this problem are one-way, none of the 
variables here can be negative. This fact leads to certain limitations on the possible 
values of the variables. For instance, X 5 < 500 because X 4 cannot be negative. Other 
constraints on the variables are considered in Practice Problem 2. ■ 


PRACTICE PROBLEMS 

1. Suppose an economy has three sectors: Agriculture, Mining, and Manufacturing. 
Agriculture sells 5% of its output to Mining and 30% to Manufacturing, and retains 
the rest. Mining sells 20% of its output to Agriculture and 70% to Manufacturing, 
and retains the rest. Manufacturing sells 20% of its output to Agriculture and 30% to 
Mining, and retains the rest. Determine the exchange table for this economy, where 
the columns describe how the output of each sector is exchanged among the three 
sectors. 

2. Consider the network flow studied in Example 2. Determine the possible range of 
values of x\ and X 2 . [Hint: The example showed that X 5 < 500. What does this imply 
about x\ and X 2 ? Also, use the fact that X 5 > 0.] 
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1.6 EXERCISES 

1. Suppose an economy has only two sectors, Goods and Ser¬ 
vices. Each year, Goods sells 80% of its output to Services 
and keeps the rest, while Services sells 70% of its output to 
Goods and retains the rest. Find equilibrium prices for the 
annual outputs of the Goods and Services sectors that make 
each sector’s income match its expenditures. 



Goods 





Services 



2. Find another set of equilibrium prices for the economy in 
Example 1. Suppose the same economy used Japanese yen 
instead of dollars to measure the value of the various sec¬ 
tors’ outputs. Would this change the problem in any way? 
Discuss. 

3. Consider an economy with three sectors, Chemicals & Met¬ 
als, Fuels & Power, and Machinery. Chemicals sells 30% of 
its output to Fuels and 50% to Machinery and retains the 
rest. Fuels sells 80% of its output to Chemicals and 10% 
to Machinery and retains the rest. Machinery sells 40% to 
Chemicals and 40% to Fuels and retains the rest. 

a. Construct the exchange table for this economy. 

b. Develop a system of equations that leads to prices at 
which each sector’s income matches its expenses. Then 
write the augmented matrix that can be row reduced to 
find these prices. 

c. [M] Find a set of equilibrium prices when the price for 
the Machinery output is 100 units. 

4. Suppose an economy has four sectors, Agriculture (A), En¬ 
ergy (E), Manufacturing (M), and Transportation (T). Sector 
A sells 10% of its output to E and 25% to M and retains the 
rest. Sector E sells 30% of its output to A, 35% to M, and 25% 
to T and retains the rest. Sector M sells 30% of its output to 
A, 15% to E, and 40% to T and retains the rest. Sector T sells 
20% of its output to A, 10% to E, and 30% to M and retains 
the rest. 

a. Construct the exchange table for this economy. 

b. [M] Find a set of equilibrium prices for the economy. 

Balance the chemical equations in Exercises 5-10 using the vector 
equation approach discussed in this section. 

5. Boron sulfide reacts violently with water to form boric acid 
and hydrogen sulfide gas (the smell of rotten eggs). The 


unbalanced equation is 

B 2 S 3 + h 2 o -> h 3 bo 3 + H 2 S 

[For each compound, construct a vector that lists the numbers 
of atoms of boron, sulfur, hydrogen, and oxygen.] 

6. When solutions of sodium phosphate and barium nitrate are 
mixed, the result is barium phosphate (as a precipitate) and 
sodium nitrate. The unbalanced equation is 

Na 3 P0 4 + Ba(N0 3 ) 2 —> Ba3(P0 4 ) 2 -t- NaN0 3 

[For each compound, construct a vector that lists the num¬ 
bers of atoms of sodium (Na), phosphorus, oxygen, barium, 
and nitrogen. For instance, barium nitrate corresponds to 
(0,0, 6,1,2).] 

7. Alka-Seltzer contains sodium bicarbonate (NaHC0 3 ) and 
citric acid (H 3 C 6 H 5 07 ). When a tablet is dissolved in water, 
the following reaction produces sodium citrate, water, and 
carbon dioxide (gas): 

NaHC0 3 + H 3 C 6 H 5 O 7 -> Na 3 C 6 H 5 07 + H 2 0 + C0 2 

8. The following reaction between potassium permanganate 
(KMn0 4 ) and manganese sulfate in water produces man¬ 
ganese dioxide, potassium sulfate, and sulfuric acid: 

KMn0 4 + MnS0 4 + H 2 0 -* Mn0 2 + K 2 S0 4 + H 2 S0 4 

[For each compound, construct a vector that lists the numbers 
of atoms of potassium (K), manganese, oxygen, sulfur, and 
hydrogen.] 

9. [M] If possible, use exact arithmetic or rational format for 
calculations in balancing the following chemical reaction: 

PbNg T CrMn 2 Og —> Pb 3 0 4 4- Cr 2 0 3 4- Mn0 2 -\- NO 

10. [M] The chemical reaction below can be used in some indus¬ 
trial processes, such as the production of arsene (AsH 3 ). Use 
exact arithmetic or rational format for calculations to balance 
this equation. 

MnS 4" As 2 Crio 035 H 2 S0 4 

-> HMn0 4 + AsH 3 + CrS 3 0 12 + H 2 0 

11. Find the general flow pattern of the network shown in the 
figure. Assuming that the flows are all nonnegative, what is 
the largest possible value for x 3 ? 


A 



12. a. Find the general traffic pattern in the freeway network 

shown in the figure. (Flow rates are in cars/minute.) 

b. Describe the general traffic pattern when the road whose 
flow is x 4 is closed. 
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c. When x 4 = 0, what is the minimum value of x \? 

200 



13. a. Find the general flow pattern in the network shown in the 

figure. 

b. Assuming that the flow must be in the directions indi¬ 
cated, find the minimum flows in the branches denoted 

by x 2 ,x 3 ,x 4 , and x 5 . 


30 40 



14. Intersections in England are often constructed as one-way 
“roundabouts,” such as the one shown in the figure. Assume 
that traffic must travel in the directions shown. Find the gen¬ 
eral solution of the network flow. Find the smallest possible 
value for x ^. 

120 150 



SOLUTIONS TO PRACTICE PROBLEMS 

1. Write the percentages as decimals. Since all output must be taken into account, each 
column must sum to 1. This fact helps to fill in any missing entries. 


Distribution of Output from: 

Agriculture 

Mining 

Manufacturing 

Purchased by: 

.65 

.20 

.20 

Agriculture 

.05 

.10 

.30 

Mining 

.30 

.70 

.50 

Manufacturing 


2. Since X 5 < 500, the equations D and A for x\ and X 2 imply that x\ > 100 
and X 2 < 700. The fact that X 5 > 0 implies that X\ < 600 and X 2 > 200. So, 
100 < x\ < 600, and 200 < X 2 < 700. 


1.7 LINEAR INDEPENDENCE 


The homogeneous equations in Section 1.5 can be studied from a different perspective 
by writing them as vector equations. In this way, the focus shifts from the unknown 
solutions of Ax = 0 to the vectors that appear in the vector equations. 
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For instance, consider the equation 



1 


4 


2 


0 


X\ 

2 

+ X2 

5 

+ X3 

1 

— 

0 

( 1 ) 


3 


6 


0 


0 



This equation has a trivial solution, of course, where x\ = X 2 = X 3 = 0. As in Sec¬ 
tion 1.5, the main issue is whether the trivial solution is the only one. 


DEFINITION 


An indexed set of vectors {vi,..., v^} in M 7? is said to be linearly independent 
if the vector equation 


X1Y1 + X2V2 H-b X v \ 


P'P 


0 


has only the trivial solution. The set {vi,..., v^} is said to be linearly dependent 
if there exist weights c \,..., c p , not all zero, such that 


c\Y\ + c 2 V 2 H-F c p \ p = 0 



Equation (2) is called a linear dependence relation among \\,... ,\ p when the 
weights are not all zero. An indexed set is linearly dependent if and only if it is not lin¬ 
early independent. For brevity, we may say that Vi,..., \ p are linearly dependent when 
we mean that {vi,..., v^} is a linearly dependent set. We use analogous terminology 
for linearly independent sets. 



" 1 " 


"4" 


" 2 " 

EXAMPLE 1 Let v, = 

2 

3 

, v 2 = 

5 

6 

, and V 3 = 

1 

0 


a. Determine if the set {vi, V 2 , V 3 } is linearly independent. 

b. If possible, find a linear dependence relation among Vi, v 2 , and V 3 . 

SOLUTION 


a. We must determine if there is a nontrivial solution of equation (1) above. Row oper¬ 
ations on the associated augmented matrix show that 


1 

4 

2 

0 


"1 

4 

2 

0 

2 

5 

1 

0 


0 

-3 

-3 

0 

1 — 

CO 

6 

0 

0 


0 

0 

0 

0 


Clearly, x\ and X 2 are basic variables, and X 3 is free. Each nonzero value of X 3 
determines a nontrivial solution of (1). Hence Vi, V 2 , V 3 are linearly dependent (and 
not linearly independent). 

b. To find a linear dependence relation among Vi, V 2 , and V3, completely row reduce 
the augmented matrix and write the new system: 


10-20 
0 110 
0 0 0 0 


x\ — 2 x 3 = 0 

X2 + X3 = 0 

0 = 0 


Thus x\ = 2 x 3 , X 2 = —X 3 , and X 3 is free. Choose any nonzero value for X 3 —say, 
X 3 = 5. Then x\ = 10 and X 2 = —5. Substitute these values into equation (1) and 
obtain 

10 vi — 5 v 2 + 5 v 3 = 0 

This is one (out of infinitely many) possible linear dependence relations among Vi, 
V 2 , and V 3 . ■ 
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Linear Independence of Matrix Columns 

Suppose that we begin with a matrix A = [ ai • • • a 77 ] instead of a set of vectors. The 
matrix equation Ax = 0 can be written as 


xiai + X 2 a 2 + • • • + x n a n = 0 

Each linear dependence relation among the columns of A corresponds to a nontrivial 
solution of Ax = 0. Thus we have the following important fact. 


The columns of a matrix A are linearly independent if and only if the equation 
Ax — 0 has only the trivial solution. (3) 


EXAMPLE 2 Determine if the columns of the matrix A = 
linearly independent. 


0 

1 

5 


1 

2 

8 



are 


SOLUTION To study Ax = 0, row reduce the augmented matrix: 


0 

1 

4 

0 


"1 

2 

-1 

0 


"1 

2 

-1 

0 

1 

2 

-1 

0 


0 

1 

4 

0 


0 

1 

4 

0 

5 

00 

0 

0 


0 

-2 

5 

0 


0 

0 

13 

0 


At this point, it is clear that there are three basic variables and no free variables. So 
the equation Ax = 0 has only the trivial solution, and the columns of A are linearly 
independent. ■ 


Sets of One or Two Vectors 

A set containing only one vector—say, v—is linearly independent if and only if v is not 
the zero vector. This is because the vector equation x\\ = 0 has only the trivial solution 
when v 7 ^ 0. The zero vector is linearly dependent because XiO = 0 has many nontrivial 
solutions. 

The next example will explain the nature of a linearly dependent set of two vectors. 


EXAMPLE 3 


a. V! 


3 

1 


, v 2 


SOLUTION 


Determine if the following sets of vectors are linearly independent. 



a. Notice that V 2 is a multiple of Vi, namely, V 2 = 2vi. Hence — 2vi + V 2 = 0, which 
shows that {vi, V 2 } is linearly dependent. 

b. The vectors Vi and V 2 are certainly not multiples of one another. Could they be 
linearly dependent? Suppose c and d satisfy 

cvi + d\ 2 = 0 


If c / 0, then we can solve for Vi in terms of V 2 , namely, Vi = {—d /c)v 2 . This result 
is impossible because Vi is not a multiple of V 2 . So c must be zero. Similarly, d must 
also be zero. Thus {vi, V 2 } is a linearly independent set. ■ 
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FIGURE 1 


THEOREM 7 


The arguments in Example 3 show that you can always decide by inspection when a 
set of two vectors is linearly dependent. Row operations are unnecessary. Simply check 
whether at least one of the vectors is a scalar times the other. (The test applies only to 
sets of two vectors.) 

A set of two vectors {vi, V2} is linearly dependent if at least one of the vectors is 
a multiple of the other. The set is linearly independent if and only if neither of the 
vectors is a multiple of the other. 

In geometric terms, two vectors are linearly dependent if and only if they lie on the 
same line through the origin. Figure 1 shows the vectors from Example 3. 

Sets of Two or More Vectors 

The proof of the next theorem is similar to the solution of Example 3. Details are given 
at the end of this section. 

Characterization of Linearly Dependent Sets 

An indexed set S = {vi,..., v^} of two or more vectors is linearly dependent if 
and only if at least one of the vectors in S is a linear combination of the others. In 
fact, if S is linearly dependent and Vi ^ 0, then some v 7 (with j > 1) is a linear 
combination of the preceding vectors, Vi,..., Vj -\. 


Warning: Theorem 7 does not say that every vector in a linearly dependent set is a 
linear combination of the preceding vectors. A vector in a linearly dependent set may 
fail to be a linear combination of the other vectors. See Practice Problem 1(c). 


EXAMPLE 4 


Let u = 


3 


" 1 " 

1 

and v = 

6 

0 


0 


. Describe the set spanned by u and v, 


and explain why a vector w is in Span {u, v} if and only if {u, v, w} is linearly dependent. 


SOLUTION The vectors u and v are linearly independent because neither vector is 
a multiple of the other, and so they span a plane in M 3 . (See Section 1.3.) In fact, 
Span{u, v} is the XiX2-plane (with X3 = 0). If w is a linear combination of u and v, 
then {u, v, w} is linearly dependent, by Theorem 7. Conversely, suppose that {u, v, w} 
is linearly dependent. By Theorem 7, some vector in {u, v, w} is a linear combination 
of the preceding vectors (since u / 0). That vector must be w, since v is not a multiple 
of u. So w is in Span {u, v}. See Figure 2. ■ 




Linearly dependent, 
w in Span{u, v} 

FIGURE 2 Linear dependence in M 3 . 


Linearly independent, 
w not in Span{u, v} 
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Example 4 generalizes to any set {u, v, w} in M 3 with u and v linearly independent. 
The set {u, v, w} will be linearly dependent if and only if w is in the plane spanned by 
u and v. 

The next two theorems describe special cases in which the linear dependence of a 
set is automatic. Moreover, Theorem 8 will be a key result for work in later chapters. 


THEOREM 8 



* 

* 

* 


If a set contains more vectors than there are entries in each vector, then the set 
is linearly dependent. That is, any set {vi,..., \ p } in W 1 is linearly dependent if 


p > n. 


FIGURE 3 

If p > n , the columns are linearly 
dependent. 



FIGURE 4 


A linearly dependent set in 



PROOE Let A = [y\ • • • ]. Then A is n x p, and the equation Ax = 0 corre¬ 

sponds to a system of n equations in p unknowns .If p > n , there are more variables 
than equations, so there must be a free variable. Hence Ax = 0 has a nontrivial solution, 
and the columns of A are linearly dependent. See Figure 3 for a matrix version of this 
theorem. ■ 


Warning: Theorem 8 says nothing about the case in which the number of vectors in 
the set does not exceed the number of entries in each vector. 

EXAMPLES The vectors 

8 , because there are three vectors in the set and there are only two entries in each vector. 
Notice, however, that none of the vectors is a multiple of one of the other vectors. See 
Figure 4. ■ 



1 

1_ 


1— 

<N 

i_ 

5 

-1 

5 

-1 

<N 

_i 


are linearly dependent by Theorem 


THEOREM 9 


If a set = 
dependent. 


{vi,..., v^} in W 1 contains the zero vector, then the set is linearly 


PROOF By renumbering the vectors, we may suppose Vi = 0. Then the equation 
lvi + 0v2 + • • • + 0v^ = 0 shows that S is linearly dependent. ■ 


EXAMPLE 6 Determine by inspection if the given set is linearly dependent. 


a. 

1 

7 

5 

2 

0 

5 

3 

1 

5 

4 

1 

b. 

2 

3 

9 

0 

0 

5 

1 

1 

c. 

"- 2 " 

4 

6 

10 

5 

3" 

-6 

-9 

15 


6 


9 


5 


8 


5 


0 


8 




SOLUTION 


a. The set contains four vectors, each of which has only three entries. So the set is 
linearly dependent by Theorem 8. 

b. Theorem 8 does not apply here because the number of vectors does not exceed the 
number of entries in each vector. Since the zero vector is in the set, the set is linearly 
dependent by Theorem 9. 

c. Compare the corresponding entries of the two vectors. The second vector seems to 

be —3/2 times the first vector. This relation holds for the first three pairs of entries, 
but fails for the fourth pair. Thus neither of the vectors is a multiple of the other, and 
hence they are linearly independent. ■ 
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Mastering: Linear 
Independence 1-31 


In general, you should read a section thoroughly several times to absorb an 
important concept such as linear independence. The notes in the Study Guide for this 
section will help you learn to form mental images of key ideas in linear algebra. For 
instance, the following proof is worth reading carefully because it shows how the 
definition of linear independence can be used. 

PROOF OF THEOREM 7 (Characterization of Linearly Dependent Sets) 

If some yj in S equals a linear combination of the other vectors, then v y can be 
subtracted from both sides of the equation, producing a linear dependence relation with 
a nonzero weight (— 1 ) on \j . [For instance, if Vi = cyyi + C3V3, then 0 = (—l)vi + 
C 2 V 2 + C 3 V 3 + OV 4 + • • • + Ov^.] Thus S is linearly dependent. 

Conversely, suppose S is linearly dependent. If Vi is zero, then it is a (trivial) 
linear combination of the other vectors in S . Otherwise, Vi 7 ^ 0, and there exist weights 
C\ ,..., c p , not all zero, such that 

c\X 1 + c 2 V2 H-F c p x p = 0 

Let j be the largest subscript for which Cj ^ 0. If j = 1, then C\X\ =0, which is 
impossible because Vi 7 ^ 0. So j > 1, and 

C 1 V 1 H-F CjXj + Ovy+1 H-F Ov^ 



PRACTICE PROBLEMS 



3 


-6 


0 


3 

1. Let u = 

2 

, v = 

1 

, w = 

-5 

, and z = 

7 


-4 


7 


2 


-5 


a. Are the sets {u, v}, {u, w}, {u, z}, {v, w}, {v, z}, and {w, z} each linearly indepen¬ 
dent? Why or why not? 

b. Does the answer to Part (a) imply that {u, v, w, z} is linearly independent? 

c. To determine if {u, v, w, z} is linearly dependent, is it wise to check if, say, w is 
a linear combination of u, v, and z? 


d. Is {u, v, w, z} linearly dependent? 



Suppose that {vi, v 2 , V3} is a linearly dependent set of vectors in W 1 and V4 is vector 
in M /? . Show that {vi, V 2 , V3 , V4} is also a linearly dependent set. 


1.7 EXERCISES 


In Exercises 1-4, determine if the vectors are linearly indepen¬ 
dent. Justify each answer. 


1 . 

1 

0 01 
_1 

5 

7 

2 

5 

9 

4 

2. 

1 

0 0 

1_ 

5 

0 

5 

5 

1 

_1 


1 

0 

_1 


1 

VO 

_1 


1 

00 

_1 


1 

<N 

_ 1 


1 

00 

_1 


i_ 



In Exercises 5-8, determine if the columns of the matrix form a 
linearly independent set. Justify each answer. 



1 

0 

00 

Ul 

_1 



1_ 

-3 

1 

0 


5 . 

3 

-7 

4 


6. 

0 

-1 

4 


-1 

5 

-4 


1 

0 

3 



1 

-3 

1_ 



5 

4 

1_ 



1 

4 

-3 

1 

0 


1 

-3 

3 

-2 

7 . 

-2 

-7 

5 

1 

8. 

-3 

7 

-1 

2 


_1 

-5 

7 

Ul 

1_ 


1 

0 

1 

-4 

3 


In Exercises 9 and 10, (a) for what values of h is v 3 in 
Span {vi, v 2 }, and (b) for what values of h is {vi, v 2 , v 3 } linearly 
dependent ? Justify each answer. 
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1 " 


-3 


5" 

9. 

Vi = 

-3 

2 _ 

, V 2 = 

i 

1_ 

,v 3 = 

-7 

h_ 



1 " 


~-2“ 


2 " 

10. 

Vi = 

i 

i_ 

,V 2 = 

10 

6 

, V 3 = 

-9 

h 


In Exercises 11-14, find the value(s) of h for which the vectors 

each answer. 


are linearly dependent. Justify 



1 


i 

_ 1 


-1 

11. 

-1 

5 

-5 

5 

5 


i 

'xj- 

_ 1 


7 _ 


i 

-ss 

_ i 


1 “ 


i 

<N 

1 _ 


i 

_ 1 

13. 

5 


-9 

5 

h 


-3 


6 


-9 



i 

<N 

1_ 


i 

_i 


1 

oo 

1_ 

12. 

-4 

5 

7 

5 

h 


1_ 


_ — 3 _ 


i 

_i 


1 " 


i 

_1 


' 1" 

14. 

-1 


7 

9 

1 


3 


OO 


h 


Determine by inspection whether the vectors in Exercises 15-20 
are linearly independent. Justify each answer. 



In Exercises 21 and 22, mark each statement True or False. Justify 
each answer on the basis of a careful reading of the text. 

21. a. The columns of a matrix A are linearly independent if the 

equation Ax = 0 has the trivial solution. 

b. If S is a linearly dependent set, then each vector is a linear 
combination of the other vectors in S. 

c. The columns of any 4x5 matrix are linearly dependent. 

d. If x and y are linearly independent, and if {x, y, z} is 
linearly dependent, then zisinSpan{x,y}. 


22. a. Two vectors are linearly dependent if and only if they lie 

on a line through the origin. 

b. If a set contains fewer vectors than there are entries in the 
vectors, then the set is linearly independent. 

c. If x and y are linearly independent, and if z is in 
Span {x, y}, then {x, y, z} is linearly dependent. 

d. If a set in R /J is linearly dependent, then the set contains 
more vectors than there are entries in each vector. 


In Exercises 23-26, describe the possible echelon forms of the 
matrix. Use the notation of Example 1 in Section 1.2. 

23. A is a 3 x 3 matrix with linearly independent columns. 


24. A is a 2 x 2 matrix with linearly dependent columns. 

25. A is a 4 x 2 matrix, A = [st\ a 2 ], and a 2 is not a multiple of 

ai. 

26. A is a 4 x 3 matrix, A = [sti a 2 a 3 ], such that {ai, a 2 } is 
linearly independent and a 3 is not in Span {ai, a 2 }. 

27. How many pivot columns must a 7 x 5 matrix have if its 
columns are linearly independent? Why? 

28. How many pivot columns must a 5 x 7 matrix have if its 
columns span R 5 ? Why? 

29. Construct 3x2 matrices A and B such that Ax = 0 has only 
the trivial solution and Bx = 0 has a nontrivial solution. 


30. a. Fill in the blank in the following statement: “If A is 

an m x n matrix, then the columns of A are linearly 
independent if and only if A has_pivot columns.” 

b. Explain why the statement in (a) is true. 


Exercises 31 and 32 should be solved without performing row 
operations. [Hint: Write Ax = 0 as a vector equation.] 


31. Given A = 


2 3 5 

-5 1 -4 

-3 -1 -4 

1 0 1 


observe that the third column 


is the sum of the first two columns. Find a nontrivial solution 
of Ax ~ 0. 


32. Given A = 


4 1 6 

-7 5 3 

9-3 3 


observe that the first column 


plus twice the second column equals the third column. Find 
a nontrivial solution of Ax = 0. 


Each statement in Exercises 33-38 is either true (in all cases) 
or false (for at least one example). If false, construct a specific 
example to show that the statement is not always true. Such an 
example is called a counterexample to the statement. If a statement 
is true, give a justification. (One specific example cannot explain 
why a statement is always true. You will have to do more work 
here than in Exercises 21 and 22.) 


33. Ifvi,..., v 4 are inR 4 and v 3 = 2vi + v 2 ,then {vi, v 2 , v 3 , v 4 } 
is linearly dependent. 




If Vi,...,v 4 are in R 4 and v 3 = 0, then {vi, v 2 ,v 3 ,v 4 } is 


linearly dependent. 

If Vi and v 2 are in R 4 and v 2 is not a scalar multiple of Vi, 
then {vi, v 2 } is linearly independent. 


36. If y i,..., v 4 are in R 4 and v 3 is not a linear combination of 
vi,v 2 ,v 4 , then {vi, v 2 , v 3 , v 4 } is linearly independent. 


37. If Vi,..., v 4 are in R 4 and {vi, v 2 , v 3 } is linearly dependent, 
then {vi, v 2 , v 3 , v 4 } is also linearly dependent. 


38. If Vi,...,v 4 are linearly independent vectors in R 4 , then 
{vi, v 2 , v 3 } is also linearly independent. [Hint: Think about 
XiVi + x 2 v 2 + x 3 v 3 + 0 • v 4 = 0.] 
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39. Suppose A is an m xn matrix with the property that for all b 
in W n the equation Ax = b has at most one solution. Use the 
definition of linear independence to explain why the columns 
of A must be linearly independent. 

40. Suppose an m x n matrix A has n pivot columns. Explain 
why for each b in W n the equation Ax = b has at most one 
solution. [Hint: Explain why Ax = b cannot have infinitely 
many solutions.] 


[M] In Exercises 41 and 42, use as many columns of A as possible 
to construct a matrix B with the property that the equation Bx = 0 
has only the trivial solution. Solve Bx = 0 to verify your work. 


41. A = 


00 

-3 

0 

-7 

2 

9 

4 

5 

11 

-7 

6 

-2 

2 

-4 

4 

5 

-1 

7 

0 

10 


42. A = 


12 

10 

-6 

-3 

7 

10 

-7 

-6 

4 

7 

-9 

5 

9 

9 

-9 

-5 

5 

-1 

-4 

-3 

1 

6 

-8 

9 

OO 

7 

-5 

-9 

11 

-8 


43. [M] With A and B as in Exercise 41 , select a column v of A 
that was not used in the construction of B and determine if 
v is in the set spanned by the columns of B . (Describe your 
calculations.) 


44. [M] Repeat Exercise 43 with the matrices A and B from 
Exercise 42. Then give an explanation for what you discover, 
assuming that B was constructed as specified. 



SOLUTIONS TO PRACTICE PROBLEMS 

1. a. Yes. In each case, neither vector is a multiple of the other. Thus each set is linearly 

independent. 

b. No. The observation in Part (a), by itself, says nothing about the linear indepen¬ 
dence of {u, v, w, z}. 

c. No. When testing for linear independence, it is usually a poor idea to check if 
one selected vector is a linear combination of the others. It may happen that 
the selected vector is not a linear combination of the others and yet the whole 
set of vectors is linearly dependent. In this practice problem, w is not a linear 
combination of u, v, and z. 

d. Yes, by Theorem 8 . There are more vectors (four) than entries (three) in them. 

2. Applying the definition of linearly dependent to {vi, \ 2 , V 3 } implies that there exist 
scalars C \, c 2 , and c 2 , not all zero, such that 


C1V1 + c 2 \ 2 + c 3V3 = 0 . 


Adding 0 V 4 = 0 to both sides of this equation results in 

C1V1 + c 2 \2 + C 3 \ 3 + 0 v 4 = 0 . 

Since C\ , c 2 , c 2 and 0 are not all zero, the set {vi , v 2 , V3, v 4 } satisfies the definition of 
a linearly dependent set. 


1.8 INTRODUCTION TO LINEAR TRANSFORMATIONS 


The difference between a matrix equation Ax = b and the associated vector equation 
X\&\ + ••• + x n a n = b is merely a matter of notation. However, a matrix equation 
Ax = b can arise in linear algebra (and in applications such as computer graphics and 
signal processing) in a way that is not directly connected with linear combinations of 
vectors. This happens when we think of the matrix A as an object that “acts” on a vector 
x by multiplication to produce a new vector called Ax. 
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For instance, the equations 



"1" 




1" 


"4 -3 1 3" 


1 


"5" 

✓'-'I 

"4-3 1 3" 


4 


"0" 

2 0 5 1 


1 


8 

ana 

2 0 5 1 


-1 


0 


1 




3 


t 

1 

t 


t 

l 

t 

A 

X 

b 


A 

u 

0 


say that multiplication by A transforms x into b and transforms u into the zero vector. 
See Figure 1. 


multiplication 


1 X 

by A 


• 



0. 

multiplication 


u cTA" 

by A \ 





H 4 


k 2 


FIGURE 1 Transforming vectors via matrix 
multiplication. 


From this new point of view, solving the equation Ax = b amounts to finding 
all vectors x in R 4 that are transformed into the vector b in R 2 under the “action” of 
multiplication by A . 

The correspondence from x to Ax is a function from one set of vectors to another. 
This concept generalizes the common notion of a function as a rule that transforms one 
real number into another. 

A transformation (or function or mapping) T from R 77 to R 777 is a rule that assigns 
to each vector x in R 77 a vector T (x) in R 777 . The set R 77 is called the domain of T , and R 777 
is called the codomain of T . The notation T : R 77 -> R 777 indicates that the domain of T 
is R 77 and the codomain is R 777 . For x in R 77 , the vector T (x) in R 777 is called the image of x 
(under the action of T). The set of all images T (x) is called the range of T . See Figure 2. 



FIGURE 2 Domain, codomain, and range of 

T : IT -> R m . 

The new terminology in this section is important because a dynamic view of matrix- 
vector multiplication is the key to understanding several ideas in linear algebra and to 
building mathematical models of physical systems that evolve over time. Such dynam¬ 
ical systems will be discussed in Sections 1.10, 4.8, and 4.9 and throughout Chapter 5. 

Matrix Transformations 

The rest of this section focuses on mappings associated with matrix multiplication. For 
each x in R 77 , T (x) is computed as Ax, where A is an m x n matrix. For simplicity, we 
sometimes denote such a matrix transformation by x i- Ax. Observe that the domain of 



































1.8 Introduction to Linear Transformations 65 


T is W 1 when A has n columns and the codomain of T is W n when each column of A 
has m entries. The range of T is the set of all linear combinations of the columns of A , 
because each image T (x) is of the form Ax. 




1 



1 

-3 


2 " 

-1 


3 


3 

EXAMPLE 1 Let A = 

3 

5 

, u = 

, b = 

2 

, c = 

2 


-1 

7 



-5 


5 


, and 


define a transformation T : M 2 —>► M 3 by T (x) = Ax, so that 


T(x) = Ax 


1 -3 


3 

1 


5 

7 


X\ 

x 2 


X\ — 3X2 

3xi + 5x2 

—Xi + 7x 2 


a. Find T (u), the image of u under the transformation T 

b. Find an x in M 2 whose image under T is b. 

c. Is there more than one x whose image under T is b? 

d. Determine if c is in the range of the transformation T. 

SOLUTION 


a. Compute 


d 


T(u) = Au 


1 

3 

1 


3 

5 

7 


2 

1 


5 

1 

9 


b. Solve T(x) = b for x. That is, solve Ax = b, or 


1 -3 




3 

3 5 


Xi 

— 

2 

-1 7 


_x 2 _ 


-5 


Using the method discussed in Section 1.4, row reduce the augmented matrix: 


1 

-3 

co 

_ i 


"1 

-3 

CO 

_1 


"1 

-3 

CO 

_1 


"1 

0 

1.5 

3 

5 

2 


0 

14 

-7 


0 

1 - 

-.5 


0 

1 

-.5 

-1 

7 

-5 


0 

4 

-2 


0 

0 

0 


0 

0 

0 


5, and x 


1.5 

-.5 


( 1 ) 


( 2 ) 


. The image of this x under T is the 


Hence x\ = 1.5, X 2 = - 

given vector b. 

Any x whose image under T is b must satisfy equation (1). From (2), it is clear that 
equation (1) has a unique solution. So there is exactly one x whose image is b. 

The vector c is in the range of T if c is the image of some x in M 2 , that is, if c = T (x) 
for some x. This is just another way of asking if the system Ax = c is consistent. To 
find the answer, row reduce the augmented matrix: 


1 

-3 

CO 

_1 


"1 

-3 

CO 

_1 


"1 

-3 

CO 

_1 


"1 

-3 

CO 

_1 

3 

5 

2 


0 

14 

-7 


0 

1 

2 


0 

1 

2 

-1 

7 

5 


0 

4 

8 


0 

14 

-7 


0 

0 

-35 


The third equation, 0 
range of T . 


35, shows that the system is inconsistent. So c is not in the 


The question in Example 1(c) is a uniqueness problem for a system of linear 
equations, translated here into the language of matrix transformations: Is b the image of 
a unique x in M 7? ? Similarly, Example 1(d) is an existence problem: Does there exist an 
x whose image is c? 
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FIGURE 3 

A projection transformation. 



sheared sheep 


DEFINITION 


The next two matrix transformations can be viewed geometrically. They reinforce 
the dynamic view of a matrix as something that transforms vectors into other vectors. 
Section 2.7 contains other interesting examples connected with computer graphics. 


EXAMPLE 2 If A 


1 

0 

0 


0 

1 

0 


0 

0 

0 


points in M 3 onto the XiX 2 -plane because 


See Figure 3. 


X\ 

X 2 

x 3 


1 -^ 


1 

0 

0 


0 

1 

0 


then the transformation x i-> Ax projects 


0 

0 

0 


X\ 

X 2 

x 3 


X\ 

X 2 

0 


EXAMPLE 3 Let A 


1 

0 


3 

1 


. The transformation T : M 2 —>► M 2 defined by 


T(x) = Ax is called a shear transformation. It can be shown that if T acts on each 
point in the 2x2 square shown in Figure 4, then the set of images forms the shaded 
parallelogram. The key idea is to show that T maps line segments onto line segments 
(as shown in Exercise 27) and then to check that the corners of the square map onto 

the vertices of the parallelogram. For instance, the image of the point u " ^ 


2 


is 


Tin) 


—1 

3" 


1 

o 
_1 


i— 

1_ 

1 

o 

1 


1 

<N 

_i 


-1 

<N 

_i 


, and the image of 


2 

2 


is 


1 

0 


3 

1 


2 

2 


8 

2 


. T 


deforms the square as if the top of the square were pushed to the right while the base is 
held fixed. Shear transformations appear in physics, geology, and crystallography. 



FIGURE 4 A shear transformation. 


Linear Transformations 

Theorem 5 in Section 1.4 shows that if A is m x n , then the transformation x i-> Ax has 
the properties 

A(u + v) = Au + A\ and A(cu) = cAu 

for all u, v in M /? and all scalars c . These properties, written in function notation, identify 
the most important class of transformations in linear algebra. 

A transformation (or mapping) T is linear if: 

(i) T(u + v) = T (u) + T (v) for all u, v in the domain of T ; 

(ii) T (c u) = cT(u) for all scalars c and all u in the domain of T. 


Every matrix transformation is a linear transformation. Important examples of 
linear transformations that are not matrix transformations will be discussed in Chapters 
4 and 5. 
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Linear transformations preserve the operations of vector addition and scalar mul¬ 
tiplication. Property (i) says that the result 7(u + v) of first adding u and v in R 77 and 
then applying T is the same as first applying T to u and to v and then adding T (u) and 
T (y) in R 777 . These two properties lead easily to the following useful facts. 


If T is a linear transformation, then 

7(0) = 0 (3) 

and 

T (cu + d\) = cT( u) + dT(y) (4) 

for all vectors u, v in the domain of T and all scalars c,d. 

Property (3) follows from condition (ii) in the definition, because 7(0) = T (Ou) = 
07(u) = 0. Property (4) requires both (i) and (ii): 

T{cxx + d\) = 7(cu) + T(d\) = c7(u) + dT{\) 

Observe that if a transformation satisfies (4) for all u, v and c, d, it must be linear. 
(Set c = d = 1 for preservation of addition, and set d =0 for preservation of scalar 
multiplication.) Repeated application of (4) produces a useful generalization: 


Hcivi +- h c p x p ) = ciT(yi) H - b c p T(\ p ) (5) 

In engineering and physics, (5) is referred to as a superposition principle. Think 
of Vi,..., \ p as signals that go into a system and T(y\),..., T(y p ) as the responses of 
that system to the signals. The system satisfies the superposition principle if whenever 
an input is expressed as a linear combination of such signals, the system’s response is 
the same linear combination of the responses to the individual signals. We will return to 
this idea in Chapter 4. 


EXAMPLE 4 Given a scalar r, define T : R 2 —> M 2 by T(x) = rx. T is called a 
contraction when 0 < r < 1 and a dilation when r > 1. Let r = 3, and show that T is 
a linear transformation. 


SOLUTION Let u, v be in M 2 and let c, d be scalars. Then 


T(cu + d\) = 3(cu + d\) 

= 3cu + 3 d\ 


Definition of T 


Vector arithmetic 


= c(3u) + d{ 3v) 

= c T (u) + d T (v) 

Thus T is a linear transformation because it satisfies (4). See Figure 5 



FIGURE 5 A dilation transformation. 
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EXAMPLE 5 


Define a linear transformation T : R 




Find the images under T of u = 



, and u + v 




SOLUTION 




T(u + y) = 



Note that 7(u + v) is obviously equal to T(u) + T (v). It appears from Figure 6 that 
T rotates u, v, and u + v counterclockwise about the origin through 90°. In fact, T 
transforms the entire parallelogram determined by u and v into the one determined by 
T (u) and T (v). (See Exercise 28.) ■ 



FIGURE 6 A rotation transformation. 

The final example is not geometrical; instead, it shows how a linear mapping can 
transform one type of data into another. 


EXAMPLE 6 A company manufactures two products, B and C. Using data from 
Example 7 in Section 1.3, we construct a “unit cost” matrix, U = [ b c], whose 
columns describe the “costs per dollar of output” for the products: 


Product 



B 

C 



”.45 

.40" 

Materials 

u = 

.25 

.30 

Labor 


.15 

.15 

Overhead 


Let x = (xi, X 2 ) be a “production” vector, corresponding to X\ dollars of product B and 
X 2 dollars of product C, and define T : R 2 -> R 3 by 



.45 


.40 


Total cost of materials 

T(x) = Ux = x 1 

.25 

.15 

+ X 2 

.30 

.15 

— 

Total cost of labor 

Total cost of overhead 


The mapping T transforms a list of production quantities (measured in dollars) into a 
list of total costs. The linearity of this mapping is reflected in two ways: 

1. If production is increased by a factor of, say, 4, from x to 4x, then the costs will 
increase by the same factor, from T (x) to AT (x). 
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2. If x and y are production vectors, then the total cost vector associated with the 
combined production x + y is precisely the sum of the cost vectors T (x) and 

T{ y). ■ 


PRACTICE PROBLEMS 


1 . 

2 . 

3. 


Suppose T : 



* 



and T (x) = v4x for some matrix A and for each x in M 5 . How 


many rows and columns does A have? 


Let A = 




. Give a geometric description of the transformation x i-> v4x. 


The line segment from 0 to a vector u is the set of points of the form t u, where 
0 < t < 1. Show that a linear transformation T maps this segment into the segment 
between 0 and T (u). 


1.8 EXERCISES 



Let A = 


2 

0 


0 

2 


, and define T : R 2 -> R 2 by T (x) = Ax 


Find the images under T of u = 


1 

3 


and v = 


a 

b 



".5 

0 

0" 


1 " 


a 

2. Let A = 

0 

.5 

0 

, u = 

0 

, and v = 

b 


0 

0 

.5 


-4 


c 


Define T : R 3 -> R 3 by T(x) = Ax. Find T( u) and T(v). 


In Exercises 3-6, with T defined by T(x) = Ax , find a vector x 
whose image under T is b, and determine whether x is unique. 



1 

0 

-2" 


"-1 “ 

3. A = 

-2 

1 

6 

,b = 

7 


3 

-2 

-5 


-3 


4. A = 


1 

0 

3 




5. A = 





6 . A = 




7. Let A be a 6 x 5 matrix. What must a and b be in order to 
define T : R a -> R^ by T (x) = Axl 


8. How many rows and columns must a matrix A have in order 
to define a mapping from R 4 into R 5 by the rule T (x) = v4x? 

For Exercises 9 and 10, find all x in R 4 that are mapped into the 
zero vector by the transformation x i-^ Ax for the given matrix A . 

1_4 7-5" 

0 1-43 

2-6 6-4 


10 . A = 




11. Letb = 



and let A be the matrix in Exercise 9. Is b in 


the range of the linear transformation x Axl Why or why 
not? 


12. Let b = 



and let A be the matrix in Exercise 10. Is 


b in the range of the linear transformation x i-> Axl Why or 
why not? 


In Exercises 13-16, use a rectangular coordinate system to plot 




, and their images under the given transfor¬ 


mation T . (Make a separate and reasonably large sketch for each 
exercise.) Describe geometrically what T does to each vector x 
in R 2 . 


13. T(x) = 

14. T(x) = 

15. T(x) = 

16. T(x) = 



17. Let T : R 2 

'5 


R 2 be a linear transformation that maps 


u = 


into 


2 

1 


and maps v = 


1 

3 


into 


-1 

3 


. Use the 


fact that T is linear to find the images under T of 3u, 2v, and 
3u + 2v. 


9. A = 
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18. The figure shows vectors u, v, and w, along with the images 
T (u) and T (v) under the action of a linear transformation 
T : IR 2 -> R 2 . Copy this figure carefully, and draw the image 
T(w) as accurately as possible. [Hint: First, write w as a 
linear combination of u and v.] 



19. Letei = 


let T : R 2 


' 1' 


" 0 " 


"2" 

0 

,e 2 = 

1 

»yi = 

5 


, and y 2 = 


-1 

6 


,and 



be a linear transformation that maps ei 
into y 1 and maps e 2 into y 2 . Find the images of $ and 


-3 


*i 

*2 


20. Let x = 


X\ 

X 2 




and v 2 



, and let 


T : IR 2 -> IR 2 be a linear transformation that maps x into 
XiVi + x 2 v 2 . Find a matrix A such that T(x) is Ax for 
each x. 


In Exercises 21 and 22, mark each statement True or False. Justify 
each answer. 


21. a. A linear transformation is a special type of function. 

b. If A is a 3 x 5 matrix and T is a transformation defined 
by T (x) = Ax, then the domain of T is IR 3 . 

c. If A is an m x n matrix, then the range of the transforma¬ 
tion x i-> Ax is R" 7 . 

d. Every linear transformation is a matrix transformation. 

e. A transformation T is linear if and only if T(ci\\-\- 
c 2 v 2 ) = c \T(y\) + c 2 T(y 2 ) for all Vi and v 2 in the 
domain of T and for all scalars c\ and c 2 . 


22. a. Every matrix transformation is a linear transformation. 

b. The codomain of the transformation x i-^ Ax is the set of 
all linear combinations of the columns of A. 



If T : 



n 


R" 7 is a linear transformation and if c is 


m 


in K"‘, then a uniqueness question is “Is c in the range 
of TV 


d. A linear transformation preserves the operations of vector 
addition and scalar multiplication. 

e. The superposition principle is a physical description of a 
linear transformation. 


Make two sketches similar to Figure 6 that illustrate prop¬ 
erties (i) and (ii) of a linear transformation. 


24. Suppose vectors y i,..., \ p span IR", and let T : IR" 


IR" be 


a linear transformation. Suppose T(\ t ) = 0 for i = 1 
Show that T is the zero transformation. That is, show that if 
x is any vector in IR", then T(x) = 0. 


25. Given v ^ 0 and p in IR" , the line through p in the direction of 
v has the parametric equation x = p + t\. Show that a linear 
transformation T : R" -> IR" maps this line onto another line 
or onto a single point (a degenerate line ). 

26. Let u and v be linearly independent vectors in IR 3 , and let P 
be the plane through u, v, and 0. The parametric equation 
of P is x = su + t\ (with s,t in IR). Show that a linear 
transformation T : M 3 —> IR 3 maps P onto a plane through 
0, or onto a line through 0, or onto just the origin in IR 3 . What 
must be true about T (u) and T (v) in order for the image of 
the plane P to be a plane? 


27. a. Show that the line through vectors p and q in IR" may be 

written in the parametric form x = (1 — t) p + t q. (Refer 
to the figure with Exercises 21 and 22 in Section 1.5.) 

b. The line segment from p to q is the set of points of the 
form (1 — t) p + tq for 0 < t < 1 (as shown in the figure 
below). Show that a linear transformation T maps this 
line segment onto a line segment or onto a single point. 

(t = 1) q (1 - Op + tq 

(t = 0) p 


28. Let u and v be vectors in IR". It can be shown that the set P of 
all points in the parallelogram determined by u and v has the 
form au + b\, for 0 < a < 1,0 < b < 1. Let T : R" -> IR m 
be a linear transformation. Explain why the image of a point 
in P under the transformation T lies in the parallelogram 
determined by T (u) and T (v). 


29. Define / : IR 



by / (*) = mx + b. 


a. Show that / is a linear transformation when b = 0. 

b. Find a property of a linear transformation that is violated 
when b ^ 0. 

c. Why is / called a linear function? 

30. An affine transformation T : IR" —* IR m has the form 
T (x) = Ax + b, with A an m xn matrix and b in IR m . Show 
that T is not a linear transformation when b ^ 0. (Affine 
transformations are important in computer graphics.) 

31. Let T : IR" -> IR" 7 be a linear transformation, and let 
{vi, v 2 , v 3 } be a linearly dependent set in IR". Explain why 
the set {T(vi), T(v 2 ), T(v 3 )} is linearly dependent. 


In Exercises 32-36, column vectors are written as rows, such as 
x = (xi , x 2 ), and T (x) is written as T (jti, x 2 ). 


23. Let T : R 2 -> IR 2 be the linear transformation that reflects 
each point through the xi-axis. (See Practice Problem 2.) 


32. Show that the transformation T defined by T(x i,x 2 ) = 
(4xi — 2 x 2 , 3|x 2 |) is not linear. 
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33. Show that the transformation T defined by T(x \,x 2 ) = 
(2x\ — 3 x 2 , X\ T" 4, 5x 2 ) is not linear. 

34. Let T : R n -> R m be a linear transformation. Show that if 
T maps two linearly independent vectors onto a linearly 
dependent set, then the equation T(x) = 0 has a nontrivial 
solution. [Hint: Suppose u and v in W 1 are linearly inde¬ 
pendent and yet T (u) and T (v) are linearly dependent. Then 
CiT(u) + c 2 T(y) = 0 for some weights c\ and c 2 , not both 
zero. Use this equation.] 

35. Let T : M 3 —> M 3 be the transformation that reflects each 
vector x = (xi,x 2 ,x 3 ) through the plane x 3 = 0 onto 
T(x) = (x\ , x 2 , — x 3 ). Show that T is a linear transformation. 
[See Example 4 for ideas.] 

36. Let T : R 3 -> R 3 be the transformation that projects each 
vector x = (x\,x 2 ,x 2 ) onto the plane x 2 = 0, so T(x) = 
(xi, 0, x 3 ). Show that T is a linear transformation. 



1_ 

-2 

5 

1 


"-9 

-4 

-9 

1 

-9 

7 

00 

0 

38. 

5 

00 

-7 

6 

-6 

4 

5 

3 

7 

11 

16 

-9 

in 

_1 

-3 

00 

-4 


_ 1 

-7 

-4 

Ul 

1_ 


39. [M] Let b = 


7 

5 

9 

7 


and let A be the matrix in Exercise 37. Is 


b in the range of the transformation x i-> Axl If so, find an x 
whose image under the transformation is b. 


40. [M] Let b = 


-7 

-7 

13 

-5 


and let A be the matrix in Exercise 38. 


Is b in the range of the transformation x Axl If so, find an 
x whose image under the transformation is b. 


[M] In Exercises 37 and 38, the given matrix determines a linear 
transformation T. Find all x such that T(x) = 0. 
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SOLUTIONS TO PRACTICE PROBLEMS 



The transformation x i-> Ax. 


1. A must have five columns for Ax to be defined. A must have two rows for the 
codomain of T to be M 2 . 

2. Plot some random points (vectors) on graph paper to see what happens. A point such 
as (4,1) maps into (4,-1). The transformation x Ax reflects points through the 
x-axis (or xi-axis). 

3. Let x = tu for some t such that 0 < t < 1. Since T is linear, T(tu ) = t T(xx ), which 
is a point on the line segment between 0 and T (u). 


1.9 THE MATRIX OF A LINEAR TRANSFORMATION 



Whenever a linear transformation T arises geometrically or is described in words, we 
usually want a “formula” for T (x). The discussion that follows shows that every linear 
transformation from W 1 to M 777 is actually a matrix transformation x Ax and that 
important properties of T are intimately related to familiar properties of A . The key to 
finding A is to observe that T is completely determined by what it does to the columns 
of the n x n identity matrix I n . 


EXAMPLE 1 The columns of I 2 


1 

0 


0 

1 


are ei 


1 

0 


Suppose T is a linear transformation from M 2 into M 3 such that 


and e? 




5 


-3 

T(e 1 ) = 

-7 

and T (e 2 ) = 

8 

x i 

2 

0 


With no additional information, find a formula for the image of an arbitrary x in M 2 . 
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SOLUTION Write 



= x\£i + x 2 e 2 



Since T is a linear transformation, 


T(x) = x\T(e\) + x 2 T(e 2 ) (2) 



5" 


"-3" 


5xi — 3x2 


X\ 

-7 

2 

+ x 2 

8 

0 

— 

—lx\ + 8 x 2 
2xi T 0 

■ 


The step from equation (1) to equation (2) explains why knowledge of T(ei) and 
T(e 2 ) is sufficient to determine T(x) for any x. Moreover, since (2) expresses 7"(x) as 
a linear combination of vectors, we can put these vectors into the columns of a matrix 
A and write (2) as 


T(x) = [r(eO 



THEOREM 10 


Let T : M 77 
A such that 


-> M 777 be a linear transformation. Then there exists a unique matrix 


T (x) = Ax for all x in M 77 


In fact, A is the m x n matrix whose yth column is the vector T (e 7 ), where e 7 is 
the j th column of the identity matrix in M 77 : 

A = [T( ei ) ••• T (e„ ) ] (3) 


PROOI Write x = I n x = [ e\ • • • e n ]x = x\e\ + • • • + x n e n , and use the linearity 
of T to compute 


T(x) = TOiei H-h x n e n ) = xiT(e { ) H-b x n T(e n ) 


[r(ei) 


T (e„) ] 


X\ 


X 


n 


Ax 


The uniqueness of A is treated in Exercise 33. 


The matrix A in (3) is called the standard matrix for the linear transforma¬ 
tion T . 

We know now that every linear transformation from M 77 to M 777 can be viewed as 
a matrix transformation, and vice versa. The term linear transformation focuses on a 
property of a mapping, while matrix transformation describes how such a mapping is 
implemented, as Examples 2 and 3 illustrate. 

EX A M P L E 2 Find the standard matrix A for the dilation transformation T (x) = 3x, 
for x in M 2 . 
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1 

0 


FIGURE 2 

The unit square. 


SOLUTION Write 


r(ei) = 3ei = 




T (e 2 ) = 3e 2 = 




0 

3 



EXAM PLE 3 Let T : M 2 -> M 2 be the transformation that rotates each point in M 2 
about the origin through an angle cp, with counterclockwise rotation for a positive angle. 
We could show geometrically that such a transformation is linear. (See Figure 6 in 
Section 1.8.) Find the standard matrix A of this transformation. 


SOLUTION 


By Theorem 10, 


T 

0 

rotates into 

cos cp 
sirup 

, and 

"0" 

1 

rotates into 

— sirup 

COS (p 


. See Figure 1 


A 


COS (p 

sin<^ 


sin<^ 

COS (p 


Example 5 in Section 1.8 is a special case of this transformation, with (p 



X, 



X 


1 


FIGURE 1 A rotation transformation. 


Geometric Linear Transformations of R 2 

Examples 2 and 3 illustrate linear transformations that are described geometrically. 
Tables 1-4 illustrate other common geometric linear transformations of the plane. 
Because the transformations are linear, they are determined completely by what they 
do to the columns of / 2 . Instead of showing only the images of ei and e 2 , the tables 
show what a transformation does to the unit square (Figure 2). 

Other transformations can be constructed from those listed in Tables 1-4 by 
applying one transformation after another. For instance, a horizontal shear could be 
followed by a reflection in the x 2 -axis. Section 2.1 will show that such a composition of 
linear transformations is linear. (Also, see Exercise 36.) 


Existence and Uniqueness Questions 

The concept of a linear transformation provides a new way to understand the existence 
and uniqueness questions asked earlier. The two definitions following Tables 1-4 give 
the appropriate terminology for transformations. 
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TABLE 1 Reflections 


Transformation Image of the Unit Square Standard Matrix 


Reflection through 


2 

' l 

o' 


the Xj-axis 

> 

k 

_ 0 

-l 




Reflection through 
the line x 2 = x 1 



0 1 
1 0 


Reflection through 
the line x 2 = ~x { 





Reflection through 
the origin 
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TABLE 2 Contractions and Expansions 


Transformation 

Horizontal 
contraction 
and expansion 


Image of the Unit Square 



0<k< 1 


k> 1 


Standard Matrix 

nr o" 

0 1 


Vertical 
contraction 
and expansion 


x, 


X, 


0 

k 


V V 



0 

k 






1 

1 ^ 

_ 0 _ 



0 


+~x 


l 


0<k< 1 


k> 1 



TABLE 3 Shears 


Transformation 


Image of the Unit Square 


Standard Matrix 


Horizontal shear 


*2 




k> 0 


1 k 
0 1 


Vertical shear 




2 


X, 



1 



1 


1 0 
k 1 
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TABLE 4 Projections 

Transformation Image of the Unit Square Standard Matrix 


Projection onto x i 

thexj-axis 


1 

0 


0 

0 



Projection onto 
the x->-axis 



0 

0 


0 

1 


DEFINITION 


A mapping T : I 
at least one x in 

R 77 . 

\ m is said to be onto R 777 if each b in 

R 777 is the image of 


Equivalently, T is onto R 777 when the range of T is all of the codomain R 777 . That is, 
T maps R 77 onto R 777 if, for each b in the codomain R 777 , there exists at least one solution 
of 7"(x) = b. “Does T map R 77 onto R 777 ?” is an existence question. The mapping T is 
not onto when there is some b in R 777 for which the equation T(x) = b has no solution. 
See Figure 3. 



T is not onto U m 

FIGURE 3 Is the range of T all of W n ? 



T 


Domain 


Range 

R” 




T is onto W n 


A mapping T : M 77 —>► M 777 is said to be one-to-one if each b in M 777 is the image 
of at most one x in M 77 . 


DEFINITION 
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Equivalently, T is one-to-one if, for each b in M 777 , the equation T(x) = b has 
either a unique solution or none at all. “Is T one-to-one?” is a uniqueness question. 
The mapping T is not one-to-one when some b in M 777 is the image of more than one 
vector in M 77 . If there is no such b, then T is one-to-one. See Figure 4. 
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T is not one-to-one T is one-to-one 


FIGURE 4 Is every b the image of at most one vector? 

The projection transformations shown in Table 4 are not one-to-one and do not map 
M 2 onto M 2 . The transformations in Tables 1,2, and 3 are one-to-one and do map M 2 
onto M 2 . Other possibilities are shown in the two examples below. 

Example 4 and the theorems that follow show how the function properties of being 
one-to-one and mapping onto are related to important concepts studied earlier in this 
chapter. 


EXAMPLE 4 Let T be the linear transformation whose standard matrix is 


A 


1-481 
0 2-13 

0 0 0 5 


Does T map M 4 onto M 3 ? Is T a one-to-one mapping? 

SOLUTION Since A happens to be in echelon form, we can see at once that A has a 
pivot position in each row. By Theorem 4 in Section 1.4, for each b in M 3 , the equation 
v4x = b is consistent. In other words, the linear transformation T maps M 4 (its domain) 
onto M 3 . However, since the equation v4x = b has a free variable (because there are four 
variables and only three basic variables), each b is the image of more than one x. That 
is, T is not one-to-one. 


THEOREM 11 


Let T : M 77 -> M 777 be a linear transformation. Then T is one-to-one if and only if 
the equation T (x) = 0 has only the trivial solution. 


Remark: To prove a theorem that says “statement P is true if and only if statement Q is 
true,” one must establish two things: (1) If P is true, then Q is true and (2) If Q is true, 
then P is true. The second requirement can also be established by showing (2a): If P is 
false, then Q is false. (This is called contrapositive reasoning.) This proof uses (1) and 
(2a) to show that P and Q are either both true or both false. 

PROOF Since T is linear, 7(0) = 0. If T is one-to-one, then the equation T(x) = 0 
has at most one solution and hence only the trivial solution. If T is not one-to-one, then 
there is a b that is the image of at least two different vectors in M 77 — say, u and v. That 
is, T(u) = b and T(v) = b. But then, since T is linear, 

T(u - v) = T(u) - T(v) = b - b = 0 

The vector u — v is not zero, since u/v. Hence the equation T(x) = 0 has more than 
one solution. So, either the two conditions in the theorem are both true or they are both 
false. ■ 
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THEOREM 12 


Let T : M 77 
T. Then: 


> M 777 be a linear transformation, and let A be the standard matrix for 


a. T maps M 77 onto M 777 if and only if the columns of A span M 777 ; 

b. T is one-to-one if and only if the columns of A are linearly independent. 


Remark: “If and only if’ statements can be linked together. For example if “P if and 
only if Q” is known and “Q if and only if R ” is known, then one can conclude “P if 
and only if RP This strategy is used repeatedly in this proof. 


PROOF 


a. By Theorem 4 in Section 1.4, the columns of A span M. 777 if and only if for each b 
in W 11 the equation Ax = b is consistent—in other words, if and only if for every b, 
the equation 7"(x) = b has at least one solution. This is true if and only if T maps 
M 77 onto M 777 . 

b. The equations T(x) = 0 and Ax = 0 are the same except for notation. So, by 

Theorem 11, T is one-to-one if and only if Ax = 0 has only the trivial solution. This 
happens if and only if the columns of A are linearly independent, as was already 
noted in the boxed statement (3) in Section 1.7. ■ 




The transformation T is not 
onto R 3 . 


Statement (a) in Theorem 12 is equivalent to the statement “7" maps M 77 onto M 777 
if and only if every vector in M 777 is a linear combination of the columns of A See 
Theorem 4 in Section 1.4. 

In the next example and in some exercises that follow, column vectors are written in 
rows, such as x = (x \, X 2 ), and T (x) is written as T(x \, X 2 ) instead of the more formal 
T((x i,x 2 )). 


EXAMPLE 5 Let T(x 1 , X 2 ) = (3xi + X 2 , 5x\ + 7x2, *1 + 3x2). Show that T is a 
one-to-one linear transformation. Does T map M 2 onto M 3 ? 


SOLUTION When x and T(x) are written as column vectors, you can determine the 
standard matrix of T by inspection, visualizing the row-vector computation of each 
entry in Ax. 



3xi + X 2 


9 ? 

• • 

T(x) = 

5xi + 7x2 

— 

9 9 

• • 


X\ + 3X2 


9 9 

• • 


A 


X\ 

x 2 


3 

5 

1 




So T is indeed a linear transformation, with its standard matrix A shown in (4). The 
columns of A are linearly independent because they are not multiples. By Theorem 
12(b), T is one-to-one. To decide if T is onto M 3 , examine the span of the columns of 
A. Since A is 3 x 2, the columns of A span M 3 if and only if A has 3 pivot positions, by 
Theorem 4. This is impossible, since A has only 2 columns. So the columns of A do not 
span M 3 , and the associated linear transformation is not onto M 3 . ■ 


PRACTICE PROBLEMS 


1. Let T 



->I be the transformation that first performs a horizontal shear that 


maps e 2 into e 2 — .5ei (but leaves ei unchanged) and then reflects the result through 
the X 2 -axis. Assuming that T is linear, find its standard matrix. [Hint: Determine the 
final location of the images of ei and e 2 »] 

2. Suppose A is a 7 x 5 matrix with 5 pivots. Let T (x) = Ax be a linear transformation 
from M 5 into M 7 . Is T a one-to-one linear transformation? Is T onto M 7 ? 
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I. 9 EXERCISES 

In Exercises 1-10, assume that 7 is a linear transformation. Find 

the standard matrix of 7. 

1. 7 : R 2 -> E 4 ,T(ei) = (3,1,3, l)and7(e 2 ) = (-5,2,0,0), 
where ei = (1,0) and e 2 = (0,1). 

2. 7 : R 3 -> R 2 , 7(d) = (1,3), 7(e 2 ) = (4, -7), and 

7(e 3 ) = (—5,4), where ei, e 2 , e 3 are the columns of the 
3x3 identity matrix. 

3. 7 : R 2 -> R 2 rotates points (about the origin) through 37r/2 
radians (counterclockwise). 

4. 7 : R 2 -> R 2 rotates points (about the origin) through —7r/4 
radians (clockwise). [Hint: 7(d) = (1/V2, — 1 / V2).] 

5. 7 : R 2 -> R 2 is a vertical shear transformation that maps ei 
into e 3 — 2e 2 but leaves the vector e 2 unchanged. 

6 . 7 : R 2 -> R 2 is a horizontal shear transformation that leaves 
e 3 unchanged and maps e 2 into e 2 + 3ei. 

7. 7 : R 2 -> R 2 first rotates points through —37r/4 radian 
(clockwise) and then reflects points through the horizontal 

Xi-axis. [Hint: 7(d) = (— l/x/2,1/V2).] 

8. 7 : R 2 -> R 2 first reflects points through the horizontal x\- 
axis and then reflects points through the line x 2 = x\. 

9. 7 : R 2 -> R 2 first performs a horizontal shear that trans¬ 
forms e 2 into e 2 — 2ei (leaving d unchanged) and then re¬ 
flects points through the line x 2 = —x \. 

10. 7 : R 2 -> R 2 first reflects points through the vertical x 2 -axis 
and then rotates points 7r/2 radians. 

II. A linear transformation 7 : R 2 -> R 2 first reflects points 
through the xi-axis and then reflects points through the x 2 - 
axis. Show that 7 can also be described as a linear transfor¬ 
mation that rotates points about the origin. What is the angle 
of that rotation? 

12. Show that the transformation in Exercise 8 is merely a rota¬ 
tion about the origin. What is the angle of the rotation? 

13. Let 7 : R 2 -> R 2 be the linear transformation such that 7(d) 
and 7(e 2 ) are the vectors shown in the figure. Using the 
figure, sketch the vector 7(2,1). 



14. 


Let 7 : R 2 -> R 2 be a linear transformation with standard 
matrix A = [ sl \ a 2 ], where ai and a 2 are shown in the 

figure. Using the figure, draw the image of ' 


3 


under the 


transformation 7. 



In Exercises 15 and 16, fill in the missing entries of the matrix, 
assuming that the equation holds for all values of the variables. 



1 

_ 1 


Xi 


3xi— 2 x 3 

15. 

? ? ? 

• • • 


*2 

— 

4xi 


1 

1 _ 


_ X 3 _ 


x\ — x 2 + x 3 


16. 

r 2 

? 

• 

? 1 

? 

• 

Xl 


Xi - x 2 
—2xi + x 2 


? 

• 

? 

• 

_ X 2 _ 




In Exercises 17-20, show that 7 is a linear transformation by 
finding a matrix that implements the mapping. Note that x\ , x 2 , ... 
are not vectors but are entries in vectors. 

17. 7(xi, x 2 , x 3 , x 4 ) = (0, x\ + x 2 , x 2 + x 3 , x 3 + x 4 ) 

18. 7(xi,x 2 ) = (2x 2 - 3xi,xi - 4x 2 ,0, x 2 ) 

19. T (xi, x 2 , x 3 ) = (xi — 5x 2 + 4x 3 , x 2 — 6x 3 ) 

20. 7(xi,x 2 ,x 3 , x 4 ) = 2xi + 3x 3 — 4x 4 (7 : R 4 -> R) 

21. Let 7 : R 2 -> R 2 be a linear transformation such that 
7(xi, x 2 ) = (xi + x 2 , 4xi + 5x 2 ). Find x such that 7(x) = 
(3,8). 

22. Let 7 : R 2 -> R 3 be a linear transformation such that 
7(xi, x 2 ) = (xi — 2x 2 , — xi + 3x 2 , 3xi — 2x 2 ). Find x such 
that 7(x) = (—1, 4, 9). 

In Exercises 23 and 24, mark each statement True or False. Justify 
each answer. 


23. a. A linear transformation 7 : W 1 -> W n is completely de¬ 
termined by its effect on the columns of the n x n identity 
matrix. 

b. If 7 : R 2 -> R 2 rotates vectors about the origin through 
an angle (p, then 7 is a linear transformation. 

c. When two linear transformations are performed one after 
another, the combined effect may not always be a linear 
transformation. 




A mapping 7 : R n -> R m is onto m 
n maps onto some vector in 1135 m 


if every vector x in 




e. If A is a 3 x 2 matrix, then the transformation x Ax 
cannot be one-to-one. 


24. a. Not every linear transformation from R M to W n is a matrix 

transformation. 

b. The columns of the standard matrix for a linear transfor¬ 
mation from R” to R m are the images of the columns of 
the n x n identity matrix. 
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c. 



The standard matrix of a linear transformation from 
to M 2 that reflects points through the horizontal axis, 

a 0 


the vertical axis, or the origin has the form 
where a and d are ± 1. 


0 


d 


d 


A mapping T : M” -> IR 7 ” is one-to-one if each vector in 
IR” maps onto a unique vector in m 



If A is a 3 x 2 matrix, then the transformation x i-> Ax 
cannot map M 2 onto 3 



In Exercises 25-28, determine if the specified linear transforma¬ 
tion is (a) one-to-one and (b) onto. Justify each answer. 

25. The transformation in Exercise 17 

26. The transformation in Exercise 2 

27. The transformation in Exercise 19 

28. The transformation in Exercise 14 

In Exercises 29 and 30, describe the possible echelon forms of the 
standard matrix for a linear transformation T. Use the notation of 
Example 1 in Section 1.2. 

29. T : M 3 -> IR 4 is one-to-one. 

30. T : M 4 -> M. 3 is onto. 


31 


Let T : IR” -> IR 7 ” be a linear transformation, with A its 
standard matrix. Complete the following statement to make 

it true: “T is one-to-one if and only if A has _pivot 

columns.” Explain why the statement is true. [Hint: Look in 
the exercises for Section 1.7.] 


32. Let T : IR' 7 -> R 7 ” be a linear transformation, with A its 
standard matrix. Complete the following statement to make 

it true: “T maps IR” onto R 7 ” if and only if A has _ 

pivot columns.” Find some theorems that explain why the 
statement is true. 

33. Verify the uniqueness of A in Theorem 10. Let T : IR” -> IR” 7 
be a linear transformation such that T(x) = Bx for some 


m x n matrix B . Show that if A is the standard matrix for 
T, then A = B . [Hint: Show that A and B have the same 
columns.] 

34. Why is the question “Is the linear transformation T onto?” 
an existence question? 


35. If a linear transformation T : IR” —> IR'" maps IR" onto 

can you give a relation between m and nl If T is one-to-one, 
what can you say about m and nl 


m 


n 



m 


36. 


Let S : IR 77 -> IR” and T : IR” -> IR 7 ” be linear transforma¬ 
tions. Show that the mapping x i-> T (*S(x)) is a linear trans¬ 
formation (from R p to IR 7 ”). [Hint: Compute T(S(cu + d\)) 
for u, v in IR 77 and scalars c and d . Justify each step of the 
computation, and explain why this computation gives the 
desired conclusion.] 


[M] In Exercises 37-40, let T be the linear transformation whose 
standard matrix is given. In Exercises 37 and 38, decide if T is a 
one-to-one mapping. In Exercises 39 and 40, decide if T maps ^ 5 



onto IR 5 . Justify your answers. 


37. 


39 


40 


-5 

10 

-5 

4 


7 

5 

4 

-9 

OO 

3 

-4 

7 


38. 

10 

6 

16 

-4 

4 

-9 

5 

-3 


12 

8 

12 

7 

-3 

-2 

5 

4 _ 


-8 

-6 

-2 

5 

4 

-7 

3 

7 

5" 





6 

-8 

5 

12 

-8 





-7 

10 

-8 

-9 

14 





3 

-5 

4 

2 

-6 





-5 

6 

-6 

-7 

3 _ 





9 

13 

5 

6 

-1 " 





14 

15 

-7 

-6 

4 





OO 

-9 

12 

-5 

-9 





-5 

-6 

-8 

9 

8 





13 

14 

15 

2 

11 






SOLUTION TO PRACTICE PROBLEMS 



1. Follow what happens to ei and e 2 . See Figure 5. First, ei is unaffected by the shear 
and then is reflected into —ei. So F(ei) = —ei. Second, e 2 goes to e 2 — .5ei by the 
shear transformation. Since reflection through the X 2 -axis changes ei into —ei and 


X, 


X 


2 


X 


2 



1 


FIGURE 5 The composition of two transformations. 
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leaves e 2 unchanged, the vector e 2 — .5ei goes to e 2 + .5ei. So T (e 2 ) = e 2 + .5ei. 
Thus the standard matrix of T is 

[7(ei) r(e 2 )] = [-ei e 2 + .5ei ] = 

2. The standard matrix representation of T is the matrix A. Since A has 5 columns and 
5 pivots, there is a pivot in every column so the columns are linearly independent. 
By Theorem 12,7" is one-to-one. Since A has 7 rows and only 5 pivots, there is not 
a pivot in every row and hence the columns of A do not span M 7 . By Theorem 12, 
and T is not onto. 


1.10 LINEAR MODELS IN BUSINESS, SCIENCE, AND ENGINEERING 



The mathematical models in this section are all linear ; that is, each describes a 
problem by means of a linear equation, usually in vector or matrix form. The first 
model concerns nutrition but actually is representative of a general technique in linear 
programming problems. The second model comes from electrical engineering. The third 
model introduces the concept of a linear difference equation , a powerful mathematical 
tool for studying dynamic processes in a wide variety of fields such as engineering, 
ecology, economics, telecommunications, and the management sciences. Linear models 
are important because natural phenomena are often linear or nearly linear when the 
variables involved are held within reasonable bounds. Also, linear models are more 
easily adapted for computer calculation than are complex nonlinear models. 

As you read about each model, pay attention to how its linearity reflects some 
property of the system being modeled. 


Constructing a Nutritious Weight-Loss Diet 

The formula for the Cambridge Diet, a popular diet in the 1980s, was based on years 
of research. A team of scientists headed by Dr. Alan H. Howard developed this diet 
at Cambridge University after more than eight years of clinical work with obese 
patients. 1 The very low-calorie powdered formula diet combines a precise balance 
of carbohydrate, high-quality protein, and fat, together with vitamins, minerals, trace 
elements, and electrolytes. Millions of persons have used the diet to achieve rapid and 
substantial weight loss. 

To achieve the desired amounts and proportions of nutrients, Dr. Howard had to 
incorporate a large variety of foodstuffs in the diet. Each foodstuff supplied several of 
the required ingredients, but not in the correct proportions. For instance, nonfat milk was 
a major source of protein but contained too much calcium. So soy flour was used for 
part of the protein because soy flour contains little calcium. However, soy flour contains 
proportionally too much fat, so whey was added since it supplies less fat in relation to 
calcium. Unfortunately, whey contains too much carbohydrate_ 

The following example illustrates the problem on a small scale. Listed in Table 1 
are three of the ingredients in the diet, together with the amounts of certain nutrients 
supplied by 100 grams (g) of each ingredient. 2 


1 The first announcement of this rapid weight-loss regimen was given in the International Journal of Obesity 
(1978)2,321-332. 

2 Ingredients in the diet as of 1984; nutrient data for ingredients adapted from USD A Agricultural 
Handbooks No. 8-1 and 8-6, 1976. 
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TABLE 1 


Amounts (g) Supplied per 100 g of Ingredient 


Amounts (g) Supplied by 
Cambridge Diet in One Day 

Nutrient 

Nonfat milk 

Soy flour 

Whey 

Protein 

36 

51 

13 

33 

Carbohydrate 

52 

34 

74 

45 

Fat 

0 

7 

1.1 

3 


EXAMPLE 1 If possible , find some combination of nonfat milk, soy flour, and whey 
to provide the exact amounts of protein, carbohydrate, and fat supplied by the diet in 
one day (Table 1). 


SOLUTION Let x\, X 2 , and X 3 , respectively, denote the number of units (100 g) of 
these foodstuffs. One approach to the problem is to derive equations for each nutrient 
separately. For instance, the product 

! X] units of ) ( protein per unit J 

nonfat milk f | of nonfat milk ) 

gives the amount of protein supplied by x\ units of nonfat milk. To this amount, we 
would then add similar products for soy flour and whey and set the resulting sum equal 
to the amount of protein we need. Analogous calculations would have to be made for 
each nutrient. 

A more efficient method, and one that is conceptually simpler, is to consider a 
“nutrient vector” for each foodstuff and build just one vector equation. The amount 
of nutrients supplied by x\ units of nonfat milk is the scalar multiple 


Scalar Vector 

! X\ units of ) ( nutrients per unit) _ 

nonfat milkj | of nonfat milk ) Xl ai ( 1 ) 

where ai is the first column in Table 1. Let a 2 and a 3 be the corresponding vectors for 
soy flour and whey, respectively, and let b be the vector that lists the total nutrients 
required (the last column of the table). Then X 2 a 2 and X 3 a 3 give the nutrients supplied 
by X 2 units of soy flour and X 3 units of whey, respectively. So the relevant equation is 

x\sl\ + x 2 a 2 + x 3 a 3 = b (2) 


Row reduction of the augmented matrix for the corresponding system of equations 
shows that 


"36 

51 

13 

33" 


"1 

0 

0 

.277 ' 

52 

34 

74 

45 


0 

1 

0 

.392 

0 

7 

1.1 

3 


0 

0 

1 

.233 


To three significant digits, the diet requires .277 units of nonfat milk, .392 units of 
soy flour, and .233 units of whey in order to provide the desired amounts of protein, 
carbohydrate, and fat. ■ 


It is important that the values of x \, X 2 , and X 3 found above are nonnegative. This is 
necessary for the solution to be physically feasible. (How could you use —.233 units of 
whey, for instance?) With a large number of nutrient requirements, it may be necessary 
to use a larger number of foodstuffs in order to produce a system of equations with 
a “nonnegative” solution. Thus many, many different combinations of foodstuffs may 
need to be examined in order to find a system of equations with such a solution. In 
fact, the manufacturer of the Cambridge Diet was able to supply 31 nutrients in precise 
amounts using only 33 ingredients. 
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The diet construction problem leads to the linear equation (2) because the amount 
of nutrients supplied by each foodstuff can be written as a scalar multiple of a vector, 
as in (1). That is, the nutrients supplied by a foodstuff are proportional to the amount of 
the foodstuff added to the diet mixture. Also, each nutrient in the mixture is the sum of 
the amounts from the various foodstuffs. 

Problems of formulating specialized diets for humans and livestock occur fre¬ 
quently. Usually they are treated by linear programming techniques. Our method of 
constructing vector equations often simplifies the task of formulating such problems. 


Linear Equations and Electrical Networks 


WEB 


Current flow in a simple electrical network can be described by a system of linear 
equations. A voltage source such as a battery forces a current of electrons to flow through 
the network. When the current passes through a resistor (such as a lightbulb or motor), 
some of the voltage is “used up”; by Ohm’s law, this “voltage drop” across a resistor is 
given by 

V = RI 


where the voltage V is measured in volts , the resistance R in ohms (denoted by £2), and 
the current flow I in amperes (amps , for short). 

The network in Figure 1 contains three closed loops. The currents flowing in loops 
1,2, and 3 are denoted by I \, 1 2 , and 1 3 , respectively. The designated directions of such 
loop currents are arbitrary. If a current turns out to be negative, then the actual direction 
of current flow is opposite to that chosen in the figure. If the current direction shown is 
away from the positive (longer) side of a battery (-| |-) around to the negative (shorter) 
side, the voltage is positive; otherwise, the voltage is negative. 

Current flow in a loop is governed by the following rule. 


KIRCHHOFF'S VOLTAGE LAW 

The algebraic sum of the RI voltage drops in one direction around a loop equals 
the algebraic sum of the voltage sources in the same direction around the loop. 






FIGURE 1 


EXAMPLE 2 Determine the loop currents in the network in Figure 1. 

SOLUTION For loop 1, the current I\ flows through three resistors, and the sum of the 
RI voltage drops is 

4 h + 4/i + 3/i - (4 + 4 + 3)/i = ll/i 

Current from loop 2 also flows in part of loop 1, through the short branch between A 
and B . The associated RI drop there is 3/2 volts. However, the current direction for the 
branch AB in loop 1 is opposite to that chosen for the flow in loop 2, so the algebraic 
sum of all RI drops for loop 1 is WI\ — 3 / 2 . Since the voltage in loop 1 is +30 volts, 
Kirchhoff’s voltage law implies that 

11 I x -3/2 = 30 

The equation for loop 2 is 

—3/i + 6/2 — I 3 = 5 

The term —3I\ comes from the flow of the loop 1 current through the branch AB (with 
a negative voltage drop because the current flow there is opposite to the flow in loop 2 ). 
The term 6/2 is the sum of all resistances in loop 2, multiplied by the loop current. The 
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term —1$ = —1 • 1$ comes from the loop 3 current flowing through the 1-ohm resistor 
in branch CD, in the direction opposite to the flow in loop 2. The loop 3 equation is 

-h + 3/ 3 = -25 

Note that the 5-volt battery in branch CD is counted as part of both loop 2 and loop 3, 
but it is —5 volts for loop 3 because of the direction chosen for the current in loop 3. 
The 20-volt battery is negative for the same reason. 

The loop currents are found by solving the system 


11/i-3 / 2 = 30 

-3 h + 6/2 - h = 5 (3) 

- I 2 + 3/3 = -25 


Row operations on the augmented matrix lead to the solution: I\ = 3 amps,/2 = 1 amp, 
and = —8 amps. The negative value of 1 3 indicates that the actual current in loop 3 
flows in the direction opposite to that shown in Figure 1. ■ 


It is instructive to look at system (3) as a vector equation: 



"11" 


-3 


0" 


30' 


h 

-3 

+ ^2 

6 

+ h 

-1 

— 

5 

(4) 


0 


-1 


3 


-25 


A 

t 

1 

1 

1*1 

r 2 

r 3 

V 


The first entry of each vector concerns the first loop, and similarly for the second and 
third entries. The first resistor vector iq lists the resistance in the various loops through 
which current I\ flows. A resistance is written negatively when I\ flows against the flow 
direction in another loop. Examine Figure 1 and see how to compute the entries in iq; 
then do the same for r 2 and r 3 . The matrix form of equation (4), 


R\ = v, where R = [ iq r 2 r 3 ] and i 


h 

h 

h 


provides a matrix version of Ohm’s law. If all loop currents are chosen in the same direc¬ 
tion (say, counterclockwise), then all entries off the main diagonal of R will be negative. 

The matrix equation R\ = v makes the linearity of this model easy to see at a 
glance. For instance, if the voltage vector is doubled, then the current vector must 
double. Also, a superposition principle holds. That is, the solution of equation (4) is 
the sum of the solutions of the equations 



30 


0 


0 

R\ = 

0 

0 

Ri = 

5 

0 

, and Ri = 

0 

-25 


Each equation here corresponds to the circuit with only one voltage source (the other 
sources being replaced by wires that close each loop). The model for current flow is 
linear precisely because Ohm’s law and Kirchhoff’s law are linear: The voltage drop 
across a resistor is proportional to the current flowing through it (Ohm), and the sum of 
the voltage drops in a loop equals the sum of the voltage sources in the loop (Kirchhoff). 

Loop currents in a network can be used to determine the current in any branch of 
the network. If only one loop current passes through a branch, such as from B to D 
in Figure 1, the branch current equals the loop current. If more than one loop current 
passes through a branch, such as from A to B , the branch current is the algebraic sum 
of the loop currents in the branch (Kirchhoff’s current law). For instance, the current in 
branch AB is I\ — / 2 = 3 — 1 = 2 amps, in the direction of I\. The current in branch 
CD is / 2 — I 3 = 9 amps. 
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Difference Equations 

In many fields such as ecology, economics, and engineering, a need arises to model 
mathematically a dynamic system that changes over time. Several features of the system 
are each measured at discrete time intervals, producing a sequence of vectors xo, xi, 
X 2 , .... The entries in x^ provide information about the state of the system at the time 
of the kth measurement. 

If there is a matrix A such that xi = Axo, X 2 = Axi, and, in general, 

x^+i = Ax k for k = 0,1,2,... (5) 


then (5) is called a linear difference equation (or recurrence relation). Given such 
an equation, one can compute xi, x 2 , and so on, provided x 0 is known. Sections 4.8 
and 4.9, and several sections in Chapter 5, will develop formulas for x^ and describe 
what can happen to x^ as k increases indefinitely. The discussion below illustrates how 
a difference equation might arise. 

A subject of interest to demographers is the movement of populations or groups of 
people from one region to another. The simple model here considers the changes in the 
population of a certain city and its surrounding suburbs over a period of years. 

Fix an initial year—say, 2014 —and denote the populations of the city and suburbs 
that year by r 0 and Sq , respectively. Let x 0 be the population vector 



City population, 2014 
Suburban population, 2014 


For 2015 and 
vectors 


subsequent years, denote the populations of the city and suburbs by the 





Our goal is to describe mathematically how these vectors might be related. 

Suppose demographic studies show that each year about 5% of the city’s population 
moves to the suburbs (and 95% remains in the city), while 3% of the suburban population 
moves to the city (and 97% remains in the suburbs). See Figure 2. 



FIGURE 2 Annual percentage migration between city and suburbs. 


After 1 year, the original r 0 persons in the city are now distributed between city and 
suburbs as 


.9 5r 0 

.05ro 



Remain in city 
Move to suburbs 



The So persons in the suburbs in 2014 are distributed 1 year later as 



.03 

.97 


Move to city 
Remain in suburbs 


( 7 ) 
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The vectors in (6) and (7) account for all of the population in 2015. 3 Thus 


r\ 

s\ 



.95 

.05 


.03 

.97 




That is, 


X] = M X() 

where M is the migration matrix determined by the following table: 



From: 


City 

Suburbs 

To: 

.95 

.03" 


City 

.05 

.97 


Suburbs 


Equation (8) describes how the population changes from 2014 to 2015. If the migration 
percentages remain constant, then the change from 2015 to 2016 is given by 

x 2 = Mx i 


and similarly for 2016 to 2017 and subsequent years. In general, 

x k +\=Mx k for k = 0,1,2,... (9) 

The sequence of vectors {x 0 , Xi, x 2 ,...} describes the population of the city/suburban 
region over a period of years. 


EXAMPLE 3 Compute the population of the region just described for the years 
2015 and 2016, given that the population in 2014 was 600,000 in the city and 400,000 
in the suburbs. 


SOLUTION The initial population in 2014 is xq = 


600,000 

400,000 


.For 2015, 


For 2016, 


xi 


x 2 = Mx 1 


> .03" 

' 600,000' 


"582,000" 


> .97 

400,000 


418,000 


' .95 .03' 

"582,000" 


"565,440" 

.05 . 

97 

418,000 


434,560 


The model for population movement in (9) is linear because the correspondence 
x k i-> x k +\ is a linear transformation. The linearity depends on two facts: the number 
of people who chose to move from one area to another is proportional to the number of 
people in that area, as shown in (6) and (7), and the cumulative effect of these choices 
is found by adding the movement of people from the different areas. 


PRACTICE PROBLEM 

Find a matrix A and vectors x and b such that the problem in Example 1 amounts to 
solving the equation Ax = b. 


3 For simplicity, we ignore other influences on the population such as births, deaths, and migration into and 
out of the city/suburban region. 
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1.10 EXERCISES 

1. The container of a breakfast cereal usually lists the number 
of calories and the amounts of protein, carbohydrate, and 
fat contained in one serving of the cereal. The amounts for 
two common cereals are given below. Suppose a mixture of 
these two cereals is to be prepared that contains exactly 295 
calories, 9 g of protein, 48 g of carbohydrate, and 8 g of fat. 

a. Set up a vector equation for this problem. Include a state¬ 
ment of what the variables in your equation represent. 

b. Write an equivalent matrix equation, and then determine 
if the desired mixture of the two cereals can be prepared. 


Nutrition Information per Serving 


Nutrient 

General Mills 
Cheerios® 

Quaker® 

100% Natural Cereal 

Calories 

110 

130 

Protein (g) 

4 

3 

Carbohydrate (g) 

20 

18 

Fat (g) 

2 

5 


2. One serving of Post Shredded Wheat® supplies 160 calories, 
5 g of protein, 6 g of fiber, and 1 g of fat. One serving of 
Crispix® supplies 110 calories, 2 g of protein, .1 g of fiber, 
and .4 g of fat. 

a. Set up a matrix B and a vector u such that B u gives the 
amounts of calories, protein, fiber, and fat contained in 
a mixture of three servings of Shredded Wheat and two 
servings of Crispix. 

b. [M] Suppose that you want a cereal with more fiber than 
Crispix but fewer calories than Shredded Wheat. Is it 
possible for a mixture of the two cereals to supply 130 
calories, 3.20 g of protein, 2.46 g of fiber, and .64 g of 
fat? If so, what is the mixture? 

3. After taking a nutrition class, a big Annie’s® Mac and Cheese 
fan decides to improve the levels of protein and fiber in 
her favorite lunch by adding broccoli and canned chicken. 
The nutritional information for the foods referred to in this 
exercise are given in the table below. 


Nutrition Information per Serving 

Nutrient Mac and Cheese Broccoli Chicken Shells 


Calories 

270 

51 

70 

260 

Protein (g) 

10 

5.4 

15 

9 

Fiber (g) 

2 

5.2 

0 

5 


a. [M] If she wants to limit her lunch to 400 calories but 
get 30 g of protein and 10 g of fiber, what proportions of 
servings of Mac and Cheese, broccoli, and chicken should 
she use? 

b. [M] She found that there was too much broccoli in the 
proportions from part (a), so she decided to switch from 


classical Mac and Cheese to Annie’s® Whole Wheat 
Shells and White Cheddar. What proportions of servings 
of each food should she use to meet the same goals as in 
part (a)? 

4. The Cambridge Diet supplies .8 g of calcium per day, in 
addition to the nutrients listed in Table 1 for Example 1. 
The amounts of calcium per unit (100 g) supplied by the 
three ingredients in the Cambridge Diet are as follows: 1.26 g 
from nonfat milk, .19 g from soy flour, and .8 g from whey. 
Another ingredient in the diet mixture is isolated soy protein, 
which provides the following nutrients in each unit: 80 g of 
protein, 0 g of carbohydrate, 3.4 g of fat, and .18 g of calcium. 

a. Set up a matrix equation whose solution determines the 
amounts of nonfat milk, soy flour, whey, and isolated 
soy protein necessary to supply the precise amounts of 
protein, carbohydrate, fat, and calcium in the Cambridge 
Diet. State what the variables in the equation represent. 

b. [M] Solve the equation in (a) and discuss your answer. 


In Exercises 5-8, write a matrix equation that determines the loop 
currents. [M] If MATLAB or another matrix program is available, 
solve the system for the loop currents. 
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9. In a certain region, about 7% of a city’s population moves 
to the surrounding suburbs each year, and about 5% of the 
suburban population moves into the city. In 2015, there were 
800,000 residents in the city and 500,000 in the suburbs. 
Set up a difference equation that describes this situation, 
where x 0 is the initial population in 2015. Then estimate 
the populations in the city and in the suburbs two years 
later, in 2017. (Ignore other factors that might influence the 
population sizes.) 

10. In a certain region, about 6% of a city’s population moves 
to the surrounding suburbs each year, and about 4% of the 
suburban population moves into the city. In 2015, there were 
10,000,000 residents in the city and 800,000 in the suburbs. 
Set up a difference equation that describes this situation, 
where x 0 is the initial population in 2015. Then estimate the 
populations in the city and in the suburbs two years later, in 
2017. 

11. In 2012 the population of California was 38,041,430, and the 
population living in the United States but outside California 
was 275,872,610. During the year, it is estimated that 
748,252 persons moved from California to elsewhere in the 
United States, while 493,641 persons moved to California 
from elsewhere in the United States. 4 

a. Set up the migration matrix for this situation, using five 
decimal places for the migration rates into and out of 
California. Let your work show how you produced the 
migration matrix. 

b. [M] Compute the projected populations in the year 2022 
for California and elsewhere in the United States, assum¬ 
ing that the migration rates did not change during the 10- 
year period. (These calculations do not take into account 
births, deaths, or the substantial migration of persons into 
California and elsewhere in the United States from other 
countries.) 


4 Migration data retrieved from http://www.governing.com/ 


12. [M] Budget® Rent A Car in Wichita, Kansas, has a fleet of 
about 500 cars, at three locations. A car rented at one location 
may be returned to any of the three locations. The various 
fractions of cars returned to the three locations are shown in 
the matrix below. Suppose that on Monday there are 295 cars 
at the airport (or rented from there), 55 cars at the east side 
office, and 150 cars at the west side office. What will be the 
approximate distribution of cars on Wednesday? 


Cars Rented From: 


Airport 

East 

West 

Returned To: 

".97 

.05 

.10" 

Airport 

.00 

.90 

.05 

East 

.03 

.05 

.85 

West 


13. [M] Let M and x 0 be as in Example 3. 

a. Compute the population vectors x k for k = 1,..., 20. 
Discuss what you find. 

b. Repeat part (a) with an initial population of 350,000 in 
the city and 650,000 in the suburbs. What do you find? 

14. [M] Study how changes in boundary temperatures on a steel 

plate affect the temperatures at interior points on the plate. 

a. Begin by estimating the temperatures 7j, T 2 , T 3 , T 4 at 
each of the sets of four points on the steel plate shown in 
the figure. In each case, the value of 7* is approximated by 
the average of the temperatures at the four closest points. 
See Exercises 33 and 34 in Section 1.1, where the values 
(in degrees) turn out to be (20, 27.5, 30,22.5). How is this 
list of values related to your results for the points in set 
(a) and set (b)? 

b. Without making any computations, guess the interior 
temperatures in (a) when the boundary temperatures are 
all multiplied by 3. Check your guess. 

c. Finally, make a general conjecture about the correspon¬ 
dence from the list of eight boundary temperatures to the 
list of four interior temperatures. 


Plate A 




Plate B 
0 ° 0 ° 
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SOLUTION TO PRACTICE PROBLEM 



36 

51 

13 


X\ 


33 

A = 

52 

34 

74 

, x = 

X 2 

, b = 

45 


0 

7 

1.1 


_*3 _ 
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CHAPTER 1 SUPPLEMENTARY EXERCISES 


1. Mark each statement True or False. Justify each answer. (If 

true, cite appropriate facts or theorems. If false, explain why 

or give a counterexample that shows why the statement is not 

true in every case. 

a. Every matrix is row equivalent to a unique matrix in 
echelon form. 

b. Any system of n linear equations in n variables has at 
most n solutions. 

c. If a system of linear equations has two different solu¬ 
tions, it must have infinitely many solutions. 

d. If a system of linear equations has no free variables, then 
it has a unique solution. 

e. If an augmented matrix [A b] is transformed into 
[ C d ] by elementary row operations, then the equa¬ 
tions Ax = b and Cx = d have exactly the same solu¬ 
tion sets. 

f. If a system Ax = b has more than one solution, then so 
does the system Ax = 0. 

g. If A is an m x n matrix and the equation Ax = b is 
consistent for some b, then the columns of A span R m . 

h. If an augmented matrix [A b ] can be transformed by 
elementary row operations into reduced echelon form, 
then the equation Ax = b is consistent. 

i. If matrices A and B are row equivalent, they have the 
same reduced echelon form. 

j. The equation Ax = 0 has the trivial solution if and only 
if there are no free variables. 

k. If A is an m x n matrix and the equation Ax = b is con¬ 
sistent for every b in R m , then A has m pivot columns. 

l. If an m x n matrix A has a pivot position in every row, 
then the equation Ax = b has a unique solution for each 
b in R ' n . 

m. If an n x n matrix A has n pivot positions, then the 
reduced echelon form of A is the n xn identity matrix. 

n. If 3 x 3 matrices A and B each have three pivot posi¬ 
tions, then A can be transformed into B by elementary 
row operations. 


o. If A is an m x n matrix, if the equation Ax = b has at 
least two different solutions, and if the equation Ax = c 
is consistent, then the equation Ax = c has many solu¬ 
tions. 


p. If A and B are row equivalent m xn matrices and if the 
columns of A span R m , then so do the columns of B . 

q. If none of the vectors in the set S = {vi, v 2 , v 3 } in R 3 is 
a multiple of one of the other vectors, then S is linearly 
independent. 

r. If {u, v, w} is linearly independent, then u, v, and w are 
not in R 2 . 


s. In some cases, it is possible for four vectors to span 



t. If u and v are in R m , then —u is in Span{u, v}. 

u . If u, v , and w are nonzero vectors in R 2 , then w is a linear 
combination of u and v. 


v. If w is a linear combination of u and v in R”, then u is a 
linear combination of v and w. 


w. Suppose that yi, v 2 , and v 3 are in R 5 , v 2 is not a multiple 
of Vj, and v 3 is not a linear combination of Vi and v 2 . 
Then {vi, v 2 , v 3 } is linearly independent. 

x. A linear transformation is a function. 


y. If A is a 6 x 5 matrix, the linear transformation x Ax 
cannot map R 5 onto R 6 . 

z. If A is an m x n matrix with m pivot columns, then the 
linear transformation x i-> Ax is a one-to-one mapping. 


2. Let a and b represent real numbers. Describe the possible 
solution sets of the (linear) equation ax = b. [Hint: The 
number of solutions depends upon a and b .] 


3. The solutions (x,y,z) of a single linear equation 
ax + by + cz = d 

form a plane in R 3 when a , b , and c are not all zero. Construct 
sets of three linear equations whose graphs (a) intersect in 
a single line, (b) intersect in a single point, and (c) have no 
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points in common. Typical graphs are illustrated in the figure. 



Three planes intersecting 
in a line 

(a) 



Three planes with no 
intersection 

(c) 



Three planes intersecting 
in a point 

(b) 



Three planes with no 
intersection 


(C) 


c. Define an appropriate linear transformation T using the 
matrix in (b), and restate the problem in terms of T. 


8. Describe the possible echelon forms of the matrix A . Use the 
notation of Example 1 in Section 1.2. 

a. A is a 2 x 3 matrix whose columns span IR 2 . 

b. A is a 3 x 3 matrix whose columns span IR 3 . 


9. Write the vector 



as the sum of two vectors, 


one on the line {(x,y) : y = 2x} and one on the line 
{(x,y) : y = x/2}. 


10. Let a! , a 2 , and b be the vectors in R 2 shown in the figure, and 
let A = [ai a 2 ] . Does the equation Ax = b have a solution? 
If so, is the solution unique? Explain. 



4. Suppose the coefficient matrix of a linear system of three 
equations in three variables has a pivot position in each 
column. Explain why the system has a unique solution. 

5. Determine h and k such that the solution set of the system 
(i) is empty, (ii) contains a unique solution, and (iii) contains 
infinitely many solutions. 

a. x\ + 3x 2 — k b. — 2x\ + hx 2 = 1 

4xi + hx 2 = 8 6x\ + kx 2 = —2 

6. Consider the problem of determining whether the following 
system of equations is consistent: 

4xi — 2x 2 + 7x 3 = —5 
8xi — 3 x 2 + 10x 3 = —3 

a. Define appropriate vectors, and restate the problem in 
terms of linear combinations. Then solve that problem. 

b. Define an appropriate matrix, and restate the problem 
using the phrase “columns of A.” 

c. Define an appropriate linear transformation T using the 
matrix in (b), and restate the problem in terms of T. 

7. Consider the problem of determining whether the following 
system of equations is consistent for all b \, b 2 , b 3 : 

2x{ — 4 x 2 — 2x 3 = b\ 

—5xi + x 2 + x 3 — b 2 
lx\ — 5 x 2 — 3x 3 = b 3 

a. Define appropriate vectors, and restate the problem in 
terms of Span {vi, v 2 , v 3 }. Then solve that problem. 

b. Define an appropriate matrix, and restate the problem 
using the phrase “columns of A.” 


11. Construct a 2 x 3 matrix A , not in echelon form, such that the 
solution of Ax = 0 is a line in 3 


15. 


16 


17 


18 



12. Construct a 2 x 3 matrix A , not in echelon form, such that the 
solution of Ax = 0 is a plane in m 3 



13. Write the reduced echelon form of a 3 x 3 matrix A such 


that the 

first 

two 


3" 


" 0 " 

A 

-2 

— 

0 


1 


0 


14. Determine the value(s) of a such that 


linearly independent. 


1 


a 


a 

o, T 2 


is 


In (a) and (b), suppose the vectors are linearly independent. 
What can you say about the numbers a ,..., /? Justify your 
answers. [Hint: Use a theorem for (b).] 



a 


b 


d 

a. 

0 


c 


e 


0 


0 


f 


b. 


a 


b 


d 

1 


c 


e 

0 

5 

1 

5 

f 

0 


0 


1 


Use Theorem 7 in Section 1.7 to explain why the columns of 
the matrix A are linearly independent. 


A = 


1 0 

2 5 

3 6 

4 7 


0 0 
0 0 
8 0 
9 10 


Explain why a set {vi, v 2 , v 3 ,v 4 } in IR 5 must be linearly 
independent when {vi, v 2 , v 3 } is linearly independent and v 4 
is not in Span {y i, v 2 , v 3 }. 


Suppose {vi,v 2 } is a linearly independent set in IR 
that {vi, Vi + v 2 } is also linearly independent. 


Show 
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19. Suppose Vi, v 2 , v 3 are distinct points on one line in R 3 . The 
line need not pass through the origin. Show that {vi, v 2 , v 3 } 
is linearly dependent. 

20. Let T : R 77 -> R m be a linear transformation, and suppose 
T(u) = y. Show that T{— u) = —y. 

21. Let T : R 3 -> R 3 be the linear transformation that re¬ 
flects each vector through the plane x 2 = 0. That is, 
T(x \, X 2 , x 3 ) = (x \, — x 2 , x 3 ). Find the standard matrix of T . 

22. Let A be a 3 x 3 matrix with the property that the linear 
transformation x f-> Ax maps R 3 onto R 3 . Explain why the 
transformation must be one-to-one. 


23. A Givens rotation is a linear transformation from R 77 to R 77 
used in computer programs to create a zero entry in a vector 
(usually a column of a matrix). The standard matrix of a 
Givens rotation in R 2 has the form 



Find a and b such that 


4 

3 


is rotated into 




A Givens rotation in R 2 . 


24. The following equation describes a Givens rotation in R 
Find a and b . 




25. A large apartment building is to be built using modular 
construction techniques. The arrangement of apartments on 
any particular floor is to be chosen from one of three basic 
floor plans. Plan A has 18 apartments on one floor, in¬ 
cluding 3 three-bedroom units, 7 two-bedroom units, and 8 
one-bedroom units. Each floor of plan B includes 4 three- 
bedroom units, 4 two-bedroom units, and 8 one-bedroom 
units. Each floor of plan C includes 5 three-bedroom units, 
3 two-bedroom units, and 9 one-bedroom units. Suppose the 
building contains a total of X\ floors of plan A, x 2 floors of 
plan B, and x 3 floors of plan C. 


a. What interpretation can be given to the vector X\ 



b. Write a formal linear combination of vectors that ex¬ 
presses the total numbers of three-, two-, and one- 
bedroom apartments contained in the building. 


c. [M] Is it possible to design the building with exactly 66 
three-bedroom units, 74 two-bedroom units, and 136 one- 
bedroom units? If so, is there more than one way to do it? 
Explain your answer. 


WEB 























Matrix Algebra 


INTRODUCTORY EXAMPLE 

Computer Models in Aircraft Design 




To design the next generation of commercial and military 
aircraft, engineers at Boeing’s Phantom Works use 3D 
modeling and computational fluid dynamics (CFD). They 
study the airflow around a virtual airplane to answer 
important design questions before physical models are 
created. This has drastically reduced design cycle times 
and cost—and linear algebra plays a crucial role in the 
process. 

The virtual airplane begins as a mathematical “wire¬ 
frame” model that exists only in computer memory and 
on graphics display terminals. (Model of a Boeing 111 is 
shown.) This mathematical model organizes and influences 
each step of the design and manufacture of the airplane— 
both the exterior and interior. The CFD analysis concerns 
the exterior surface. 

Although the finished skin of a plane may seem 
smooth, the geometry of the surface is complicated. In 
addition to wings and a fuselage, an aircraft has nacelles, 
stabilizers, slats, flaps, and ailerons. The way air flows 
around these structures determines how the plane moves 
through the sky. Equations that describe the airflow are 
complicated, and they must account for engine intake, 
engine exhaust, and the wakes left by the wings of the 
plane. To study the airflow, engineers need a highly refined 
description of the plane’s surface. 

A computer creates a model of the surface by first 
superimposing a three-dimensional grid of “boxes” on the 


original wire-frame model. Boxes in this grid lie either 
completely inside or completely outside the plane, or they 
intersect the surface of the plane. The computer selects 
the boxes that intersect the surface and subdivides them, 
retaining only the smaller boxes that still intersect the 
surface. The subdividing process is repeated until the grid 
is extremely fine. A typical grid can include more than 
400,000 boxes. 

The process for finding the airflow around the plane 
involves repeatedly solving a system of linear equations 
Ax = b that may involve up to 2 million equations and 
variables. The vector b changes each time, based on data 
from the grid and solutions of previous equations. Using 
the fastest computers available commercially, a Phantom 
Works team can spend from a few hours to several days 
setting up and solving a single airflow problem. After the 
team analyzes the solution, they may make small changes 
to the airplane surface and begin the whole process again. 
Thousands of CFD runs may be required. 

This chapter presents two important concepts that 
assist in the solution of such massive systems of equations: 

• Partitioned matrices: A typical CFD system of 
equations has a “sparse” coefficient matrix with 
mostly zero entries. Grouping the variables correctly 
leads to a partitioned matrix with many zero blocks. 
Section 2.4 introduces such matrices and describes 
some of their applications. 
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• Matrix factorizations: Even when written with 
partitioned matrices, the system of equations is 
complicated. To further simplify the computations, 
the CFD software at Boeing uses what is called 
an LU factorization of the coefficient matrix. 
Section 2.5 discusses LU and other useful matrix 
factorizations. Further details about factorizations 
appear at several points later in the text. 

To analyze a solution of an airflow system, engineers 
want to visualize the airflow over the surface of the plane. 
They use computer graphics, and linear algebra provides 
the engine for the graphics. The wire-frame model of the 
plane’s surface is stored as data in many matrices. Once the 
image has been rendered on a computer screen, engineers 
can change its scale, zoom in or out of small regions, and 
rotate the image to see parts that may be hidden from view. 
Each of these operations is accomplished by appropriate 



Modern CFD has revolutionized wing design. The Boeing 
Blended Wing Body is in design for the year 2020 or sooner. 

matrix multiplications. Section 2.7 explains the basic 
ideas. 

WEB 


Our ability to analyze and solve equations will be greatly enhanced when we can perform 
algebraic operations with matrices. Furthermore, the definitions and theorems in this 
chapter provide some basic tools for handling the many applications of linear algebra 
that involve two or more matrices. For square matrices, the Invertible Matrix Theorem 
in Section 2.3 ties together most of the concepts treated earlier in the text. Sections 2.4 
and 2.5 examine partitioned matrices and matrix factorizations, which appear in most 
modern uses of linear algebra. Sections 2.6 and 2.7 describe two interesting applications 
of matrix algebra, to economics and to computer graphics. 


2.1 MATRIX OPERATIONS 


If A is an m x n matrix—that is, a matrix with m rows and n columns—then the scalar 
entry in the z th row and j th column of A is denoted by < 2 /y and is called the (/, j) -entry 
of A. See Figure 1. For instance, the (3, 2)-entry is the number <2 32 in the third row, 
second column. Each column of A is a list of m real numbers, which identifies a vector 
in M 777 . Often, these columns are denoted by ai ,..., a„ , and the matrix A is written as 

A = [ai a 2 ••• a„ ] 

Observe that the number aij is the / th entry (from the top) of the j th column vector a 7 . 

The diagonal entries in an m x n matrix A = [ a / 7 ] are an , < 222 , < 233 , . .., and they 
form the main diagonal of A. A diagonal matrix is a square n x n matrix whose 
nondiagonal entries are zero. An example is the n x n identity matrix, I n . An m x n 
matrix whose entries are all zero is a zero matrix and is written as 0. The size of a zero 
matrix is usually clear from the context. 
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Row i 



Column 


• 

i 

a n 

• 

• 

• 

■ ■ ■ a Xj 

• 

• 

• 


a ij 

• 

• 

• 

Clml 

• 

• 

• 

• • • n 

Li m j 

T 

T 

a i 

a , 


FIGURE 1 Matrix notation. 



Sums and Scalar Multiples 

The arithmetic for vectors described earlier has a natural extension to matrices. We say 
that two matrices are equal if they have the same size (i.e., the same number of rows 
and the same number of columns) and if their corresponding columns are equal, which 
amounts to saying that their corresponding entries are equal. If A and B are m x n 
matrices, then the sum A + B is the m x n matrix whose columns are the sums of 
the corresponding columns in A and B . Since vector addition of the columns is done 
entry wise, each entry in A + B is the sum of the corresponding entries in A and B . The 
sum A + B is defined only when A and B are the same size. 


EXAMPLE 1 Let 



Then 



A + B = 





but A + C is not defined because A and C have different sizes. 



If r is a scalar and A is a matrix, then the scalar multiple rA is the matrix whose 
columns are r times the corresponding columns in A. As with vectors, —A stands for 
(—l)v4, and A — B is the same as A + (—1)2?. 


EXAM PLE 2 If A and B are the matrices in Example 1, then 


2 B 

A-2B 



It was unnecessary in Example 2 to compute A — 2B as A + (—1)22? because the 
usual rules of algebra apply to sums and scalar multiples of matrices, as the following 
theorem shows. 


Let v4, B , and C be matrices of the same size, and let r and s be scalars. 

a. A + B = B + A d. r(A + B) = rA + rB 

b. (A + B) + C = A + (B + C) e. (r + s)v4 = rA + sA 

c. A + 0 = A f. r(sA) = ( rs)A 


THEOREM 1 
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Each equality in Theorem 1 is verified by showing that the matrix on the left side has 
the same size as the matrix on the right and that corresponding columns are equal. Size 
is no problem because A, B , and C are equal in size. The equality of columns follows 
immediately from analogous properties of vectors. For instance, if the j th columns of 
A, B , and C are ay, b y , and c y , respectively, then the j th columns of (A + B) + C and 
A + (B + C) are 

(ay + by ) + Cy and ay + (b; + Cy) 

respectively. Since these two vector sums are equal for each j , property (b) is verified. 

Because of the associative property of addition, we can simply write A + B + C 
for the sum, which can be computed either as (A + B) + C or as A + (B + C). The 
same applies to sums of four or more matrices. 


Matrix Multiplication 

When a matrix B multiplies a vector x, it transforms x into the vector i?x. If this vector 
is then multiplied in turn by a matrix A, the resulting vector is v4(^x). See Figure 2. 

Multiplication Multiplication 



Bx A(Bx) 


FIGURE 2 Multiplication by B and then A. 

Thus v4(i?x) is produced from x by a composition of mappings—the linear transfor¬ 
mations studied in Section 1.8. Our goal is to represent this composite mapping as 
multiplication by a single matrix, denoted by AB , so that 

A(Bx) = (AB)x (1) 


See Figure 3. 


Multiplication Multiplication 



FIGURE 3 Multiplication by AB. 


If A is m x n, B is n x p, and x is in V , denote the columns of B by bi ,..., b^ 
and the entries in x by X \,..., x p . Then 

Bx = x\b{ + • • • + x p b p 

By the linearity of multiplication by A , 

A(Bx) = A(xibi) H-b ^(x^b^) 

= Xiv4bi + • • • + XpAbp 
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DEFINITION 


The vector A(Bx) is a linear combination of the vectors Ab \,..., Ab p , using the entries 
in x as weights. In matrix notation, this linear combination is written as 

A(Bx) = [Ab\ v4b2 ••• Ab p ]x 

Thus multiplication by [Ab\ A \)2 • • • Ab p ] transforms x into A(Bx). We have found 

the matrix we sought! 


If A is an m x n matrix, and if B is an n x p matrix with columns bi,..., b^, 
then the product AB is the m x p matrix whose columns are Ab \,..., Ab p . That 

is, 

AB = A[b\ b2 ••• b^,] — [v4bi Ab2 ••• v4b^J 


This definition makes equation (1) true for all x in Equation (1) proves that the 
composite mapping in Figure 3 is a linear transformation and that its standard matrix is 
AB. Multiplication of matrices corresponds to composition of linear transformations . 


EXAMPLE 3 


Compute AB , where A = 




and B 






SOLUTION Write i? = [bi b2 b3 ], and compute: 



Then 

AB = A[b\ 


b 2 b 3 ]= f 

t 

Ab v 




Notice that since the first column of AB is v4bi, this column is a linear combination 
of the columns of A using the entries in bi as weights. A similar statement is true for 
each column of AB. 


Each column of AB is a linear combination of the columns of A using weights 
from the corresponding column of B. 


Obviously, the number of columns of A must match the number of rows in B in 
order for a linear combination such as Ab\ to be defined. Also, the definition of AB 
shows that AB has the same number of rows as A and the same number of columns 
as B. 

EXAM PLE 4 If A is a 3 x 5 matrix and B is a 5 x 2 matrix, what are the sizes of 
AB and BA , if they are defined? 
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SOLUTION Since A has 5 columns and B has 5 rows, the product AB is defined and 
is a 3 x 2 matrix: 


A B AB 


* 

* 

* 

* 

* 


* 

* 


* 

* 

* 

* 

* 

* 

* 


* 

* 

— 

* 

* 

* 

* 

* 

* 

* 


* 

* 


* 

* 






* 

* 








* 

* 





3x5 



5 

i 

<N 

X 

3 x 

2 


Match 


Size of AB 


The product BA is not defined because the 2 columns of B do not match the 3 rows 
of A. ■ 

The definition of AB is important for theoretical work and applications, but the 
following rule provides a more efficient method for calculating the individual entries in 
AB when working small problems by hand. 


ROW-COLUMN RULE FOR COMPUTING AB 


If the product AB is defined, then the entry in row i and column j of AB is the 
sum of the products of corresponding entries from row i of A and column j of 
B . If ( AB)ij denotes the (/, j)- -entry in AB , and if A is an m x n matrix, then 


{AB) 




di\b\j + < 2 / 2 ^ 2 / + * * ‘ + CLi n b 


2 / 


nj 


To verify this rule, let B = [ bi • • • b^ ]. Column j of AB is A by , and we can 
compute Abj by the row-vector rule for computing Ax from Section 1.4. The / th entry 
in Abj is the sum of the products of corresponding entries from row i of A and the 
vector by, which is precisely the computation described in the rule for computing the 
(/, j )--entry of AB. 


EXAMPLE 5 Use the row-column rule to compute two of the entries in AB for the 
matrices in Example 3. An inspection of the numbers involved will make it clear how 
the two methods for calculating AB produce the same matrix. 


SOLUTION To find the entry in row 1 and column 3 of AB , consider row 1 of A and 
column 3 of B. Multiply corresponding entries and add the results, as shown below: 



□ 

2(6) + 3(3)' 


"□ □ 21" 

□ 

□ 


□ □ □ 


For the entry in row 2 and column 2 of AB , use row 2 of A and column 2 of B : 


\ 



"2 

3" 


"4 

3 

6 " 



□ 

21 " 



□ 

21 " 

— 

1 

-5 


1 

-2 

3 


□ 

1(3) H—5(—2) 

□ 


□ 

13 

□ 
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EXAMPLE 6 Find the entries in the second row of AB , where 



SOLUTION By the row-column rule, the entries of the second row of AB come from 
row 2 of A (and the columns of B): 



□ 

□ 


"□ 

□ " 

-4 + 21-12 

6 + 3-8 


5 

1 

□ 

□ 


□ 

□ 

□ 

□ 


□ 

□ 


Notice that since Example 6 requested only the second row of AB , we could have 
written just the second row of A to the left of B and computed 



This observation about rows of AB is true in general and follows from the row-column 
rule. Let row, (A) denote the /th row of a matrix A. Then 


row/ (AB) = row/ (A) • B 



Properties of Matrix Multiplication 

The following theorem lists the standard properties of matrix multiplication. Recall that 
I m represents the m x m identity matrix and I m x = x for all x in M 777 . 


THEOREM 2 


Let A be an m x n matrix, and let B and C have sizes for which the indicated 
sums and products are defined. 


a. A(BC) = (AB)C 

b. A(B + C) = AB + AC 

c. (B + C)A = BA + CA 

d. r(AB ) = (rA)B = A(rB) 
for any scalar r 




(associative law of multiplication) 
(left distributive law) 

(right distributive law) 


(identity for matrix multiplication) 


PROOF Properties (b)-(e) are considered in the exercises. Property (a) follows from 
the fact that matrix multiplication corresponds to composition of linear transformations 
(which are functions), and it is known (or easy to check) that the composition of func¬ 
tions is associative. Here is another proof of (a) that rests on the “column definition” of 
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the product of two matrices. Let 


c = [Cl ••• c p ] 

By the definition of matrix multiplication, 

BC = [ Bc\ • • • Bc p ] 

A(BC) = [A(B Cl ) ... A(Bc p )] 

Recall from equation (1) that the definition of AB makes A(Bx) = ( AB)x for all x, so 

A(BC) = [ (AB)c\ • • • (AB)c p ] = ( AB)C ■ 


The associative and distributive laws in Theorems 1 and 2 say essentially that pairs 
of parentheses in matrix expressions can be inserted and deleted in the same way as in 
the algebra of real numbers. In particular, we can write ABC for the product, which 
can be computed either as A(BC) or as (AB)C } Similarly, a product ABCD of four 
matrices can be computed as A(BCD) or ( ABC)D or A(BC)D , and so on. It does not 
matter how we group the matrices when computing the product, so long as the left-to- 
right order of the matrices is preserved. 

The left-to-right order in products is critical because AB and BA are usually not 
the same. This is not surprising, because the columns of AB are linear combinations of 
the columns of A , whereas the columns of BA are constructed from the columns of B . 
The position of the factors in the product AB is emphasized by saying that A is right- 
multiplied by B or that B is left-multiplied by A . If AB = BA , we say that A and B 
commute with one another. 


EXAMPLE 7 Let A = 

not commute. That is, verify 


5 

3 



and B 


that AB ^ BA . 




Show that these matrices do 


SOLUTION 



Example 7 illustrates the first of the following list of important differences between 
matrix algebra and the ordinary algebra of real numbers. See Exercises 9-12 for exam¬ 
ples of these situations. 


WARNINGS: 

1. In general, AB ^ BA. 

2. The cancellation laws do not hold for matrix multiplication. That is, if 
AB = AC , then it is not true in general that B = C. (See Exercise 10.) 

3. If a product AB is the zero matrix, you cannot conclude in general that either 
A = 0 or B = 0. (See Exercise 12.) 


1 When B is square and C has fewer columns than A has rows, it is more efficient to compute A(BC) than 
(. AB)C . 
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Powers of a Matrix 

If A is an n x n matrix and if k is a positive integer, then A k denotes the product of k 
copies of A : 

A k = A- • • A 

k 

If A is nonzero and if x is in R", then A k x is the result of left-multiplying x by A 
repeatedly k times. If k = 0, then A°x should be x itself. Thus A 0 is interpreted as the 
identity matrix. Matrix powers are useful in both theory and applications (Sections 2.6, 
4.9, and later in the text). 


The Transpose of a Matrix 

Given an m x n matrix A , the transpose of A is the n x m matrix, denoted by A T , 
whose columns are formed from the corresponding rows of A . 


EXAMPLE 8 Let 


Then 



Let A and B denote matrices whose sizes are appropriate for the following sums 
and products. 

a. ( A t ) t = A 

b. (A + B) t = A t + B t 

c. For any scalar r, ( rA) T = rA T 

d. (AB) t = B t A t 


Proofs of (a)-(c) are straightforward and are omitted. For (d), see Exercise 33. 
Usually, ( AB) t is not equal to A T B T , even when A and B have sizes such that the 
product A t B t is defined. 

The generalization of Theorem 3(d) to products of more than two factors can be 
stated in words as follows: 


The transpose of a product of matrices equals the product of their transposes in 
the reverse order. 


The exercises contain numerical examples that illustrate properties of transposes. 
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NUMERICAL NOTES - 

1. The fastest way to obtain AB on a computer depends on the way in which 
the computer stores matrices in its memory. The standard high-performance 
algorithms, such as in LAPACK, calculate AB by columns, as in our definition 
of the product. (A version of LAPACK written in C++ calculates AB by rows.) 

2. The definition of AB lends itself well to parallel processing on a computer. The 
columns of B are assigned individually or in groups to different processors, 
which independently and hence simultaneously compute the corresponding 
columns of AB . 


PRACTICE PROBLEMS 


1. Since vectors in M /? may be regarded as n x 1 matrices, the properties of transposes 
in Theorem 3 apply to vectors, too. Let 






Compute (Ax) r , x T A T , xx 7 , and x 7 x. Is A 7 x 7 defined? 

Let A be a 4 x 4 matrix and let x be a vector in M 4 . What is the fastest way to compute 
A 2 x? Count the multiplications. 


T 


T, 


T„T 


3. Suppose A is an m x n matrix, all of whose rows are identical. Suppose B is an n x p 
matrix, all of whose columns are identical. What can be said about the entries in AB ? 


2.1 EXERCISES 


In Exercises 1 and 2, compute each matrix sum or product if it is 
defined. If an expression is undefined, explain why. Let 




1. -2 A, B-2A, 


B = 


7 

1 


D = 


3 

-1 


AC, CD 



2. A + 2B, 3C -E, CB, EB 




7. If a matrix A is 5 x 3 and the product AB is 5 x 7, what is the 
size of B1 


8. How many rows does B have if BC is a 3 x 4 matrix? 


In the rest of this exercise set and in those to follow, you should 
assume that each matrix expression is defined. That is, the sizes 
of the matrices (and vectors) involved “match” appropriately. 


9. Let A = 



5 

1 


and B = 


k, if any, will make AB = BA1 


4 

3 



. What value(s) of 


3. Let A = 

"4 -f 
5 -2 

. Compute 3I 2 — A and ( 3I 2 )A . 

10. Let A = 

"2-3" 
-4 6 

,B = 

"8 4" 

5 5 

, and C = 

"5 -2" 
3 1 


4. Compute A - 51 3 and (5/ 3 M, when Verif y that AB = AC and >' ct B * C ■ 



In Exercises 5 and 6, compute the product AB in two ways: (a) by 
the definition, where Ab\ and Ah 2 are computed separately, and 
(b) by the row-column rule for computing AB. 



" 1 

1 

1 " 


"2 

0 

0" 


11. Let A = 

1 

2 

3 

and D = 

0 

3 

0 

. Com- 


1 

4 

5 


0 

0 

5 



pute AD and DA. Explain how the columns or rows of A 
change when A is multiplied by D on the right or on the 
left. Find a 3 x 3 matrix B , not the identity matrix or the zero 
matrix, such that AB = BA. 






















































2.1 Matrix Operations 103 



Let A = 


. Construct a 2 x 2 matrix B such that 


3 -6 
-1 2 

AB is the zero matrix. Use two different nonzero columns 


for B. 


13. Let ri,. .., r p be vectors in IR", and let Q be an m x n matrix. 
Write the matrix [ <2 r i • • • Qr p ] as a product of two matrices 
(neither of which is an identity matrix). 

14. Let U be the 3x2 cost matrix described in Example 6 of 
Section 1.8. The first column of U lists the costs per dollar of 
output for manufacturing product B, and the second column 
lists the costs per dollar of output for product C. (The costs 
are categorized as materials, labor, and overhead.) Let q 2 be 
a vector in IR 2 that lists the output (measured in dollars) of 
products B and C manufactured during the first quarter of 
the year, and let q 2 , q 3 , and q 4 be the analogous vectors 
that list the amounts of products B and C manufactured in 
the second, third, and fourth quarters, respectively. Give an 
economic description of the data in the matrix UQ, where 

Q = tai q 2 <h qJ- 


21. Suppose the last column of AB is entirely zero but B itself 
has no column of zeros. What can you say about the columns 
of A? 

22. Show that if the columns of B are linearly dependent, then 
so are the columns of AB. 

23. Suppose CA = I n (the n x n identity matrix). Show that the 
equation Ax = 0 has only the trivial solution. Explain why 
A cannot have more columns than rows. 

24. Suppose AD = I m (the m x m identity matrix). Show that 
for any b in R m , the equation Ax = b has a solution. [Hint: 
Think about the equation AD b = b.] Explain why A cannot 
have more rows than columns. 

25. Suppose A is an m x n matrix and there exist n x m matrices 
C and D such that CA = I n and AD = I m . Prove that m = n 
and C = D. [Hint: Think about the product CAD.] 

26. Suppose A is a 3 x n matrix whose columns span M. 3 . Explain 
how to construct an n x 3 matrix D such that AD = / 3 . 


Exercises 15 and 16 concern arbitrary matrices A, B, and C for 
which the indicated sums and products are defined. Mark each 
statement True or False. Justify each answer. 


15. a. If A and B are 2x2 with columns ai,a 2 , and bi,b 2 , 

respectively, then AB = [aibi a 2 b 2 ]. 

b. Each column of AB is a linear combination of the columns 
of B using weights from the corresponding column of A. 

c. AB + AC = A(B + C) 

d. A t + B t = (A + B) t 

e. The transpose of a product of matrices equals the product 
of their transposes in the same order. 


16. a. If A and B are 3x3 and B = [bi b 2 b 3 ],then AB = 

[Abi Ab 2 + Ab 3 ]. 

b. The second row of AB is the second row of A multiplied 
on the right by B. 

c. ( AB)C = (AC) B 

d. ( AB) t = A t B t 

e. The transpose of a sum of matrices equals the sum of their 
transposes. 


17. If A 



and Ai? = 



, determine 


the first and second columns of B. 


18. Suppose the first two columns, bi and b 2 , of B are equal. 
What can you say about the columns of AB (if AB is defined)? 
Why? 


In Exercises 27 and 28, view vectors in IR" as n x 1 matrices. For 
u and v in IR", the matrix product u r v is a 1 x 1 matrix, called the 
scalar product, or inner product, of u and v. It is usually written 
as a single real number without brackets. The matrix product uv r 
is an n x n matrix, called the outer product of u and v. The 
products u r v and uv r will appear later in the text. 


27. Let u = 


~-2“ 


a 

3 

andv = 

b 

-4 


c 


. Compute u r v, v 1 u, uv 1 , and 


T 


T 


T 

vir . 


28. If u and v are in IR", how are u r v and v r u related? How are 
uv r and vu r related? 


29. Prove Theorem 2(b) and 2(c). Use the row-column rule. The 
(/,j)-entry in A(B + C) can be written as 

n 

dixibij + C\j) + • • • + cij n (b n j + c„j) or cijkibkj + G/) 

k =l 

30. Prove Theorem 2(d). [Hint: The (/, y)-entry in ( rA)B is 

(ra n )bij H-b (ra in )b nj \ 

31. Show that I m A = A when A is an m x n matrix. You can 
assume I m x = x for all x in IR" ? . 

32. Show that AI n = A when A is an m x n matrix. [Hint: Use 
the (column) definition of AI n .] 

33. Prove Theorem 3(d). [Hint: Consider the jth row of (AB) T .] 

34. Give a formula for (ABx) T , where x is a vector and A and B 
are matrices of appropriate sizes. 


19. Suppose the third column of B is the sum of the first two 
columns. What can you say about the third column of ABI 
Why? 

20. Suppose the second column of B is all zeros. What can you 
say about the second column of ABI 


35. [M] Read the documentation for your matrix program, and 
write the commands that will produce the following matrices 
(without keying in each entry of the matrix). 

a. A 5 x 6 matrix of zeros 

b. A3 x 5 matrix of ones 
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c. The 6 x 6 identity matrix 

d. A 5 x 5 diagonal matrix, with diagonal entries 3,5,7,2,4 

A useful way to test new ideas in matrix algebra, or to make 
conjectures, is to make calculations with matrices selected at 
random. Checking a property for a few matrices does not prove 
that the property holds in general, but it makes the property more 
believable. Also, if the property is actually false, you may discover 
this when you make a few calculations. 

36. [M] Write the command(s) that will create a 6 x 4 matrix 
with random entries. In what range of numbers do the entries 
lie? Tell how to create a 3 x 3 matrix with random integer 
entries between —9 and 9. [Hint: If x is a random number 
such that 0 < x < 1, then —9.5 < 19(x — .5) < 9.5.] 

37. [M] Construct a random 4x4 matrix A and test whether 
(A + I)(A — I) = A 2 — /. The best way to do this is to 
compute (A + I)(A — I) — (A 2 — I) and verify that this dif¬ 
ference is the zero matrix. Do this for three random matrices. 
Then test (A + B)(A — B) = A 2 — B 2 the same way for 


three pairs of random 4x4 matrices. Report your conclu¬ 
sions. 


38. [M] Use at least three pairs of random 4x4 matrices A and 
B to test the equalities (A + B) T = A T + B T and ( AB) T = 
A T B T . (See Exercise 37.) Report your conclusions. [Note: 
Most matrix programs use A' for A T .] 


39. [M] Let 



10 0 0 
0 10 0 
0 0 10 
0 0 0 1 
0 0 0 0 


Compute S k for k = 2,..., 6. 



[M] Describe in words what happens when you compute A 5 , 
A 10 , A 20 , and A 30 for 



'1/6 

1/2 

1/3 

A = 

1/2 

1/4 

1/4 


1/3 

1/4 

5/12 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Ax 


1 

_l 


"5" 


1- 

1_ 

<N 

_1 

4 


3 


—i 

<N 

_i 


. So (Ax) T = [ —4 2]. Also, 


TaT 


x'A 


[5 3] 


1 

3 


2 

4 


[-4 2] 


The quantities (Ax) T and x T A T are equal, by Theorem 3(d). Next, 


xx 






25 

15 



x r x= [5 



= [25 + 9] = 34 


A 1 x 1 matrix such as x T x is usually written without the brackets. Finally, A T x T is 
not defined, because x 7 does not have two rows to match the two columns of A T . 

2. The fastest way to compute A 2 x is to compute A (Ax). The product Ax requires 
16 multiplications, 4 for each entry, and A(Ax) requires 16 more. In contrast, the 
product A 2 requires 64 multiplications, 4 for each of the 16 entries in A 2 . After that, 
A 2 x takes 16 more multiplications, for a total of 80. 

3. First observe that by the definition of matrix multiplication, 

AB = [Ab\ Ab2 ••• Ab n ] = [Abi Abi ••• Abi], 

so the columns of AB are identical. Next, recall that row, (AB) = row, (A) • B. Since 
all the rows of A are identical, all the rows of AB are identical. Putting this informa¬ 
tion about the rows and columns together, it follows that all the entries in AB are the 
same. 


2.2 THE INVERSE OF A MATRIX 


Matrix algebra provides tools for manipulating matrix equations and creating various 
useful formulas in ways similar to doing ordinary algebra with real numbers. This section 
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investigates the matrix analogue of the reciprocal, or multiplicative inverse, of a nonzero 
number. 

Recall that the multiplicative inverse of a number such as 5 is 1/5 or 5 _1 . This 
inverse satisfies the equations 

5 -1 -5=1 and 5 • 5 _1 = 1 

The matrix generalization requires both equations and avoids the slanted-line notation 
(for division) because matrix multiplication is not commutative. Furthermore, a full 
generalization is possible only if the matrices involved are square. 1 

An n x n matrix A is said to be invertible if there is an n x n matrix C such that 

CA = I and AC = I 

where I = I n , the n x n identity matrix. In this case, C is an inverse of A. In fact, C 
is uniquely determined by A , because if B were another inverse of A , then B = BI = 
B(AC ) = ( BA)C = IC = C. This unique inverse is denoted by A ~ 1 , so that 


A 1 A = / and A A 1 = I 


A matrix that is not invertible is sometimes called a singular matrix, and an invertible 
matrix is called a nonsingular matrix. 


EXAMPLE 1 If A = 


AC = 
CA = 



Thus C 



Here is a simple formula for the inverse of a 2 x 2 matrix, along with a test to tell 
if the inverse exists. 


THEOREM 4 


Let A = 


a 

c 


b 

d 


. If ad 


— be ^ 0, then A is invertible and 




The simple proof of Theorem 4 is outlined in Exercises 25 and 26. The quantity 
ad — be is called the determinant of A, and we write 

det A = ad — be 

Theorem 4 says that a 2 x 2 matrix A is invertible if and only if det i / 0. 


1 One could say that an m x n matrix A is invertible if there exist n x m matrices C and D such that 
CA = I n and AD = I m . However, these equations imply that A is square and C = D. Thus A is invertible 
as defined above. See Exercises 23-25 in Section 2.1. 
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THEOREM 5 


EXAMPLE 2 Find the inverse of A 


SOLUTION Since det A = 3(6) - 4(5) 


3 4 

5 6 


2 ^ 0, A is invertible, and 


A 


1 _ 1 

"6-4" 


6 /(_2) —4/(—2) 1 


' -3 2 

-2 

-5 3 


—5/(—2) 3/(—2) J 


5/2 - 3/2 


Invertible matrices are indispensable in linear algebra—mainly for algebraic calcu¬ 
lations and formula derivations, as in the next theorem. There are also occasions when 
an inverse matrix provides insight into a mathematical model of a real-life situation, as 
in Example 3, below. 


If A is an invertible n x n matrix, then for each b in W l , the equation Ax 
the unique solution x = A~ l b. 


b has 


PROOF Take any b in M ;? . A solution exists because if A ! b is substituted for x, 
then Ax = A(A~ l b) = (AA~ l )b = /b = b. So A ! b is a solution. To prove that the 
solution is unique, show that if u is any solution, then u, in fact, must be v4 -1 b. Indeed, 
if An = b , we can multiply both sides by A 1 and obtain 

A~ l Au = A~‘ b, Iu = A~‘ b, and u = A~‘ b ■ 

EXAMPLE 3 A horizontal elastic beam is supported at each end and is subjected 
to forces at points 1,2, and 3, as shown in Figure 1. Let f in M . 3 list the forces at these 
points, and let y in M 3 list the amounts of deflection (that is, movement) of the beam at 
the three points. Using Hooke’s law from physics, it can be shown that 

y = Df 

where D is a. flexibility matrix. Its inverse is called the stiffness matrix. Describe the 
physical significance of the columns of D and D _1 . 


#1 #2 #3 



SOLUTION Write 1 3 = [ei e 2 e 3 ] and observe that 

D = D /3 = [ Dei De 2 De 3 ] 

Interpret the vector ei = (1,0, 0) as a unit force applied downward at point 1 on the 
beam (with zero force at the other two points). Then Dei, the first column of D, lists 
the beam deflections due to a unit force at point 1. Similar descriptions apply to the 
second and third columns of D. 

To study the stiffness matrix D _1 , observe that the equation f = D _1 y computes a 
force vector f when a deflection vector y is given. Write 

D -1 = D _1 / 3 = [ D _1 ei D~ l e 2 D _ 1 e 3 ] 

Now interpret ei as a deflection vector. Then D -1 e 1 lists the forces that create the 
deflection. That is, the first column of D _1 lists the forces that must be applied at the 
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three points to produce a unit deflection at point 1 and zero deflections at the other points. 
Similarly, columns 2 and 3 of D~ l list the forces required to produce unit deflections at 
points 2 and 3, respectively. In each column, one or two of the forces must be negative 
(point upward) to produce a unit deflection at the desired point and zero deflections at 
the other two points. If the flexibility is measured, for example, in inches of deflection 
per pound of load, then the stiffness matrix entries are given in pounds of load per inch 
of deflection. ■ 

The formula in Theorem 5 is seldom used to solve an equation Ax = b numerically 
because row reduction of [ A b ] is nearly always faster. (Row reduction is usually 
more accurate, too, when computations involve rounding off numbers.) One possible 
exception is the 2x2 case. In this case, mental computations to solve Ax = b are 
sometimes easier using the formula for A ~ l , as in the next example. 


EXAMPLE 4 


Use the inverse of the matrix A in Example 2 to solve the system 


3xi + 4x2 = 3 
5xi + 6 x 2 = 7 


SOLUTION This system is equivalent to Ax = b, so 


X = A~% = 





The next theorem provides three useful facts about invertible matrices. 


THEOREM 6 


a. If A is an invertible matrix, then A 1 is invertible and 



b. If A and B are n x n invertible matrices, then so is AB , and the inverse of AB 
is the product of the inverses of A and B in the reverse order. That is, 


(AB)~ l = B~ l A~ l 



If A is an invertible matrix, then so is A 1 , and the inverse of A T is the transpose 
of A~ l . That is, 

( a t r 1 = ( A-y 


PROOF To verify statement (a), find a matrix C such that 

A~ l C = I and CA~ l = I 

In fact, these equations are satisfied with A in place of C . Hence A~ l is invertible, and 
A is its inverse. Next, to prove statement (b), compute: 

( AB)(B~ 1 A ~ 1 ) = A(BB~ l )A~ l = AIA~ l = AA~ l = I 

A similar calculation shows that {B~ l A~ l ){AB) = I . For statement (c), use Theorem 
3 (d), read from right to left, (A~ l ) T A T = ( AA~ l ) T = I T = / .Similarly, A T ( A~ l ) T = 
I T = I . Hence A T is invertible, and its inverse is (A~ l ) T . ■ 

Remark: Part (b) illustrates the important role that definitions play in proofs. The the¬ 
orem claims that B~ l A~ l is the inverse of AB. The proof establishes this by showing 
that B~ l A~ l satisfies the definition of what it means to be the inverse of AB . Now, the 
inverse of A B is a matrix that when multiplied on the left (or right) by A B , the product 
is the identity matrix I . So the proof consists of showing that B~ l A~ l has this property. 









108 CHAPTER 2 Matrix Algebra 


The following generalization of Theorem 6(b) is needed later. 

The product of n x n invertible matrices is invertible, and the inverse is the 
product of their inverses in the reverse order. 


There is an important connection between invertible matrices and row operations 
that leads to a method for computing inverses. As we shall see, an invertible matrix A is 
row equivalent to an identity matrix, and we can find A -1 by watching the row reduction 
of A to I. 

Elementary Matrices 

An elementary matrix is one that is obtained by performing a single elementary row 
operation on an identity matrix. The next example illustrates the three kinds of elemen¬ 
tary matrices. 


EXAMPLES Let 



1_ 

0 

1 

0 


1 

0 

1 

1 

0 


1_ 

0 

1— 

0 

Ei = 

0 

1 

0 

, E 2 = 

1 

0 

0 

, E 3 = 

0 

1 

0 


_1 

0 

1_ 


1 

0 

0 

1_ 


1 

0 

0 

1_ 


a b c 



Compute EiA, E 2 A, and E 3 A, and describe how these products can be obtained by 
elementary row operations on A . 


SOLUTION Verify that 



a b c 


d 

e 

r 

EiA = 

d e f 

, E 2 A = 

a 

b 

c 


1 

fee 

_1 


_g 

h 

• 

1 


a b c 



e 

5h 



Addition of —4 times row 1 of A to row 3 produces E\A. (This is a row replacement 
operation.) An interchange of rows 1 and 2 of A produces E 2 A, and multiplication of 
row 3 of A by 5 produces E 3 A. ■ 


Left-multiplication (that is, multiplication on the left) by E\ in Example 5 has the 
same effect on any 3 x n matrix. It adds —4 times row 1 to row 3. In particular, since 
E\ • / = Ei , we see that E 1 itself is produced by this same row operation on the identity. 
Thus Example 5 illustrates the following general fact about elementary matrices. See 
Exercises 27 and 28. 


If an elementary row operation is performed on an m x n matrix A, the resulting 
matrix can be written as EA , where the m x m matrix E is created by performing 
the same row operation on I m . 
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Since row operations are reversible, as shown in Section 1.1, elementary matrices 
are invertible, for if E is produced by a row operation on /, then there is another row op¬ 
eration of the same type that changes E back into 1. Hence there is an elementary matrix 
F such that FE = /. Since E and F correspond to reverse operations, EF = /, too. 


Each elementary matrix E is invertible. The inverse of E is the elementary matrix 
of the same type that transforms E back into /. 


EXAMPLE 6 Find the inverse of E\ — 



0 

1 

0 



SOLUTION To transform E\ into /, add +4 times row 
matrix that does this is 



1 to row 3. The elementary 


The following theorem provides the best way to “visualize” an invertible matrix, 
and the theorem leads immediately to a method for finding the inverse of a matrix. 


THEOREM 7 


An n x n matrix A is invertible if and only if A is row equivalent to I n , and in 
this case, any sequence of elementary row operations that reduces A to I n also 
transforms I n into A~ l . 


Remark: The comment on the proof of Theorem 11 in Chapter 1 noted that “P if and 
only if <2” is equivalent to two statements: (1) “If P then Q” and (2) “If Q then P 
The second statement is called the converse of the first and explains the use of the word 
conversely in the second paragraph of this proof. 

PROOF Suppose that A is invertible. Then, since the equation Ax = b has a solution 
for each b (Theorem 5), A has a pivot position in every row (Theorem 4 in Section 1.4). 
Because A is square, the n pivot positions must be on the diagonal, which implies that 
the reduced echelon form of A is I n . That is, A ~ I n . 

Now suppose, conversely, that A ~ I n . Then, since each step of the row reduction 
of A corresponds to left-multiplication by an elementary matrix, there exist elementary 
matrices E\,... ,E p such that 

A ~ E\A ~ E 2 (E { A) ~ ~ Ep(E p -i • • • E\i 4) = I n 

That is, 

E p ••• E\A = I n (1) 

Since the product E p •• • E\ of invertible matrices is invertible, (1) leads to 

(E p ---E l y l (E p ---E l )A = {Ep-'-E^h 

A = (Ep-.-EO- 1 

Thus A is invertible, as it is the inverse of an invertible matrix (Theorem 6). Also, 

.4- 1 =[(E P ---E I )-' r 1 = Ep---E> 

Then A~ { = E p • • • E\ • I n , which says that A~ l results from applying E \,..., E p suc¬ 
cessively to I n . This is the same sequence in (1) that reduced A to I n . ■ 
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An Algorithm for Finding A -1 

If we place A and I side by side to form an augmented matrix [A I ], then row 
operations on this matrix produce identical operations on A and on I. By Theorem 7, 
either there are row operations that transform A to I n and I n to A~ l or else A is not 
invertible. 


ALGORITHM FOR FINDING A ’ 1 

Row reduce the augmented matrix [A I ]. If A is row equivalent to /, then 
[A I ] is row equivalent to [ / A~ l ]. Otherwise, A does not have an inverse. 


EXAMPLE 7 Find the inverse of the matrix A 


0 

1 

4 


1 2 


0 

3 


3 

8 


, if it exists 


SOLUTION 


M /] 


Theorem 7 shows, since A ~ /, that A is invertible, and 


A 


-l 


9/2 7 -3/2 

-2 4 -1 


3/2 


2 1/2 


It is a good idea to check the final answer: 


AA 


-l 



0 

1 

2 

1 

0 


0 


"1 

0 

3 

0 

1 

0 

— 

1 

0 

3 

0 

1 


0 


0 

1 

2 

1 

0 

0 


4 

-3 

8 

0 

0 


1 


4 

-3 

8 

0 

0 

1 


"1 

0 

3 

0 

1 


0" 


"1 

0 

3 

0 

1 

0" 


0 

1 

2 

1 

0 


0 


0 

1 

2 

1 

0 

0 


0 

-3 

-4 

0 - 

-4 


1 


0 

0 

2 

3 

-4 

1 


"1 

0 

3 

0 


1 


0 








0 

1 

2 

1 


0 


0 








0 

0 

1 

3/2 


-2 

l/2_ 








"1 

0 

0 

-9/2 


7 

-3/2” 








0 

1 

0 

-2 


4 


-1 








0 

0 

1 

3/2 


-2 

1/2 








0 

1 

2" 


-9/2 

7 

-3/2 


"1 

0 

0 

1 

0 

3 


-2 

4 

-1 

— 

0 

1 

0 

4 

-3 

8 


3/2 

-2 

l/2_ 


0 

0 

1 


It is not necessary to check that A 1 A = / since A is invertible. 


Another View of Matrix Inversion 

Denote the columns of I n by ei,..., e n . Then row reduction of [ A I ] to [ I A~ l ] 
can be viewed as the simultaneous solution of the n systems 

Ax = ei, Ax = e 2 , ..., Ax = e n (2) 

where the “augmented columns” of these systems have all been placed next to A to form 
[A ei e 2 ••• e n ] = [A /]. The equation A A~ l =/ and the definition of matrix 
multiplication show that the columns of A -1 are precisely the solutions of the systems 
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in (2). This observation is useful because some applied problems may require finding 
only one or two columns of A -1 . In this case, only the corresponding systems in (2) 
need be solved. 


WEB 


i— NUMERICAL NOTE - 

In practical work, A -1 is seldom computed, unless the entries of A -1 are needed. 
Computing both A -1 and A _1 b takes about three times as many arithmetic 
operations as solving Ax = b by row reduction, and row reduction may be more 
accurate. 


PRACTICE PROBLEMS 


1. 


2 . 


Use determinants to determine which of the following matrices are invertible 


a. 


—1 

CO 

-9" 

b. 

1_ 

-9" 


VO 

1_ 

-9" 

<N 

_1 

1_ 

O 
_1 

Ul 

1 

C • 

_1 

1 


Find the inverse of the matrix A 


1 -2 -1 


1 

5 


5 

4 


6 

5 


, if it exists 


3. If A is an invertible matrix, prove that 5A is an invertible matrix. 


2.2 EXERCISES 


Find the inverses of the matrices in Exercises 1-4. 



5. Use the inverse found in Exercise 1 to solve the system 

8xi + 6x2 — 2 

5x\ + 4x 2 = — 1 

6. Use the inverse found in Exercise 3 to solve the system 

8 x 1 + 5x 2 = —9 
—7xi — 5 x 2 = 11 


7. Let A = 




and b 4 = 




a. Find A 1 , and use it to solve the four equations Ax = bi, 
Ax = b 2 , Ax = b 3 , Ax = b 4 


b. The four equations in part (a) can be solved by the same 
set of row operations, since the coefficient matrix is the 
same in each case. Solve the four equations in part (a) by 
row reducing the augmented matrix [A bi b 2 b 3 b 4 ]. 


In Exercises 9 and 10, mark each statement True or False. Justify 
each answer. 


9 


10 


11 


a. In order for a matrix B to be the inverse of A, both 
equations AB = I and BA = I must be true. 

b. If A and B are n x n and invertible, then A 1 B 1 is the 
inverse of AB. 


c. If.4 = 


a 

c 


b 

d 


and ab — cd ^ 0, then A is invertible 


d. If A is an invertible n x n matrix, then the equation 
Ax = b is consistent for each b in 



e. Each elementary matrix is invertible. 

a. A product of invertible n x n matrices is invertible, and 
the inverse of the product is the product of their inverses 
in the same order. 

b. If A is invertible, then the inverse of A -1 is A itself. 


c. If A = 


a 

c 


b 

d 


and ad = be , then A is not invertible 


d. If A can be row reduced to the identity matrix, then A must 
be invertible. 

e. If A is invertible, then elementary row operations that 
reduce A to the identity I n also reduce A -1 to I n . 

Let A be an invertible n x n matrix, and let B be an n x p 
matrix. Show that the equation AX = B has a unique solu¬ 
tion A -1 B. 


8. Use matrix algebra to show that if A is invertible and D 
satisfies AD = /, then D = A -1 . 


12. Let A be an invertible n xn matrix, and let B be an n x p ma¬ 
trix. Explain why A -1 B can be computed by row reduction: 
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If [A B] -[/ X], then X = A~ l B. 

If A is larger than 2x2, then row reduction of [A B] is much 
faster than computing both A -1 and A -1 B. 

13. Suppose AB = AC, where B and C are n x p matrices and A 
is invertible. Show that B = C . Is this true, in general, when 
A is not invertible? 

14. Suppose {B — C)D =0, where B and Caremx/i matrices 
and D is invertible. Show that B = C. 

15. Suppose A, B, and C are invertible n xn matrices. Show that 
ABC is also invertible by producing a matrix D such that 
{ABC) D — I and D {ABC) = 7. 

16. Suppose A and B are n x n, B is invertible, and AB is invert¬ 
ible. Show that A is invertible. [Hint: Let C = AB, and solve 
this equation for A.] 

17. Solve the equation AB = BC for A, assuming that A, B, and 
C are square and B is invertible. 

18. Suppose P is invertible and A = PBP~ l . Solve for B in 
terms of A. 

19. If A, B, and Care n x n invertible matrices, does the equation 
C~ l {A + X)B~ l = I n have a solution, XI If so, find it. 

20. Suppose A, B, and X are n x n matrices with A, X, and 
A — AX invertible, and suppose 

(A — AX)~ l = X~ l B (3) 

a. Explain why B is invertible. 

b. Solve (3) for X. If you need to invert a matrix, explain 
why that matrix is invertible. 


3x3 matrix and 7 = / 3 . (A general proof would require slightly 
more notation.) 


27. a. Use equation (1) from Section 2.1 to show that 

row, (A) = row, (7) • A, for i = 1,2, 3. 

b. Show that if rows 1 and 2 of A are interchanged, then the 
result may be written as EA , where E is an elementary 
matrix formed by interchanging rows 1 and 2 of 7. 

c. Show that if row 3 of A is multiplied by 5, then the result 
may be written as EA, where E is formed by multiplying 
row 3 of 7 by 5. 

28. Show that if row 3 of A is replaced by row 3 (A) — 4 • rowi (A), 
the result is EA, where E is formed from 7 by replacing 
row 3 (/) by row 3 (/) — 4 • rowi(7). 


Find the inverses of the matrices in Exercises 29-32, if they exist. 
Use the algorithm introduced in this section. 



33. Use the algorithm from this section to find the inverses of 



Let A be the corresponding n x n matrix, and let B be its 
inverse. Guess the form of B , and then prove that AB = 7 


and BA = 7. 


21. Explain why the columns of an n x n matrix A are linearly 
independent when A is invertible. 


22. Explain why the columns of an n xn matrix A span M" when 
A is invertible. [Hint: Review Theorem 4 in Section 1.4.] 

23. Suppose A is n xn and the equation Ax = 0 has only the 
trivial solution. Explain why A has n pivot columns and A is 
row equivalent to 7„. By Theorem 7, this shows that A must 
be invertible. (This exercise and Exercise 24 will be cited in 
Section 2.3.) 

24. Suppose A is n x n and the equation Ax = b has a solution 
for each b in M.". Explain why A must be invertible. [Hint: Is 
A row equivalent to I n ?] 

Exercises 25 and 26 prove Theorem 4 for A = 


25. 


Show that if ad — be = 0, then the equation Ax = 0 has 
more than one solution. Why does this imply that A is not 
invertible? [Hint: First, consider a = b = 0. Then, if a and 

—b 

b are not both zero, consider the vector x = 


a 


■] 


26. Show that if ad — be / 0, the formula for A 1 works. 


34. 


Repeat the strategy of Exercise 33 to guess the inverse of 
'10 0 ••• O' 

12 0 0 

12 3 0 


A = 


1 


3 


• • 


n 


. Prove that your guess is 


correct. 


35. Let A = 


-2 

2 

1 


-7 

5 

3 


9 

6 

4 


. Find the third column of A 


-l 


without computing the other columns. 


T ” 


"-25 

-9 

-27" 

a b 

7 

36. [M] Let A = 

546 

180 

537 

c d 


154 

50 

149 


. Find the second and 


third columns of A 1 without computing the first column. 


37. Let A = 


1 

1 

1 


2 

3 

5 


. Construct a 2 x 3 matrix C (by trial and 


error) using only 1,-1, and 0 as entries, such that CA = 7 2 . 
Compute AC and note that AC ^ 7 3 . 


Exercises 27 and 28 prove special cases of the facts about elemen¬ 
tary matrices stated in the box following Example 5. Here A is a 


38. Let A = 



1 1 
0 1 



Construct a 4 x 2 matrix D 
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using only 1 and 0 as entries, such that AD = I 2 . Is it possi¬ 
ble that CA = 1 4 for some 4x2 matrix C? Why or why not? 


39. Let D = 


.005 

.002 

.001 

.002 

.004 

.002 

.001 

.002 

.005 


be a flexibility matrix, 


with flexibility measured in inches per pound. Suppose 
that forces of 30, 50, and 20 lb are applied at points 1, 
2, and 3, respectively, in Figure 1 of Example 3. Find the 
corresponding deflections. 

40. [M] Compute the stiffness matrix D~ l for D in Exercise 39. 
List the forces needed to produce a deflection of .04 in. at 
point 3, with zero deflections at the other points. 


41. [M] Let D = 


.0040 

.0030 

.0010 

.0005 

.0030 

.0050 

.0030 

.0010 

.0010 

.0030 

.0050 

.0030 

.0005 

.0010 

.0030 

.0040 


be a 


flexibility matrix for an elastic beam with four points at which 
force is applied. Units are centimeters per newton of force. 
Measurements at the four points show deflections of .08, .12, 
.16, and .12 cm. Determine the forces at the four points. 


#1 #2 #3 #4 



Deflection of elastic beam in Exercises 41 and 42. 


42. [M] With D as in Exercise 41, determine the forces that 
produce a deflection of .24 cm at the second point on the 
beam, with zero deflections at the other three points. How is 
the answer related to the entries in D _1 ? [Hint: First answer 
the question when the deflection is 1 cm at the second point.] 


SOLUTIONS TO PRACTICE PROBLEMS 


1. a. det 


3 

2 


9 

6 


3 • 6 — (—9) - 2=18+18 = 36. The determinant is nonzero, so 


the matrix is invertible. 


b. det 




= 4 • 5 — (—9) • 0 = 20 ^ 0. The matrix is invertible. 



c. det 


6 

4 


9 

6 


6 • 6 — (—9) (—4) = 36 — 36 


M /] 



1 

-2 

-1 

1 

0 

0 


-1 

5 

6 

0 

1 

0 


5 

-4 

5 

0 

0 

1 


"1 - 

-2 ■ 

-1 

1 

0 

0 " 


0 

3 

5 

1 

1 

0 


0 

6 

10 - 

-5 

0 

1 


"1 - 

<N 

1 

-1 

1 

0 

0 " 


0 

3 

5 

1 

1 

0 


0 

0 

0 - 

i 

r- 

i 

<N 

1 

1 


= 0. The matrix is not invertible. 


So [ A I ] is row equivalent to a matrix of the form [B D ], where B is square 
and has a row of zeros. Further row operations will not transform B into /, so we 
stop. A does not have an inverse. 


3. Since A is an invertible matrix, there exists a matrix C such that AC = I = CA.The 
goal is to find a matrix D so that ( 5A)D = I = D(5A). Set D = 1/5 C . Applying 
Theorem 2 from Section 2.1 establishes that (5A)(l/5 C) = (5)(1/5)(AC) = 11 = 
/, and (1/5 C)(5A) = (1/5)(5)(CA) = 11 = I. Thus 1/5 C is indeed the inverse of 
A, proving that A is invertible. 


2.3 CHARACTERIZATIONS OF INVERTIBLE MATRICES 


This section provides a review of most of the concepts introduced in Chapter 1, in 
relation to systems of n linear equations in n unknowns and to square matrices. The 
main result is Theorem 8. 
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THEOREM 8 


The Invertible Matrix Theorem 

Let A be a square n x n matrix. Then the following statements are equivalent. 
That is, for a given A, the statements are either all true or all false. 


a. A is an invertible matrix. 

b. A is row equivalent to the n x n identity matrix. 

c. A has n pivot positions. 

d. The equation Ax = 0 has only the trivial solution. 

e. The columns of A form a linearly independent set. 

f. The linear transformation x i-> Ax is one-to-one. 

g. The equation Ax = b has at least one solution for each b in R 

h. The columns of A span 



i. The linear transformation x i-> Ax maps W 1 onto 

j. There is an n x n matrix C such that CA = I . 

k. There is an n x n matrix D such that AD = I . 

l. A T is an invertible matrix. 



n 



FIGURE 1 


(k) 

(a) <= (g) 

(g) <^> (h) <^> (i) 
(d) (e) (f) 


(a) <=> (1) 


First, we need some notation. If the truth of statement (a) always implies that state¬ 
ment (j) is true, we say that (a) implies (j) and write (a) => (j). The proof will establish 
the “circle” of implications shown in Figure 1. If any one of these five statements is 
true, then so are the others. Finally, the proof will link the remaining statements of the 
theorem to the statements in this circle. 

PROOF If statement (a) is true, then A 1 works for C in (j), so (a) =4> (j). Next, (j) =4> (d) 
by Exercise 23 in Section 2.1. (Turn back and read the exercise.) Also, (d) => (c) by 
Exercise 23 in Section 2.2. If A is square and has n pivot positions, then the pivots 
must lie on the main diagonal, in which case the reduced echelon form of A is I n . Thus 
(c) => (b). Also, (b) =4> (a) by Theorem 7 in Section 2.2. This completes the circle in 
Figure 1. 

Next, (a) => (k) because A~ l works for D . Also, (k) => (g) by Exercise 24 in Sec¬ 
tion 2.1, and (g) =4> (a) by Exercise 24 in Section 2.2. So (k) and (g) are linked to 
the circle. Further, (g), (h), and (i) are equivalent for any matrix, by Theorem 4 in 
Section 1.4 and Theorem 12(a) in Section 1.9. Thus, (h) and (i) are linked through (g) to 
the circle. 

Since (d) is linked to the circle, so are (e) and (f), because (d), (e), and (f) are all 
equivalent for any matrix A . (See Section 1.7 and Theorem 12(b) in Section 1.9.) Finally, 
(a) => (1) by Theorem 6(c) in Section 2.2, and (1) =4> (a) by the same theorem with A and 
A t interchanged. This completes the proof. ■ 

Because of Theorem 5 in Section 2.2, statement (g) in Theorem 8 could also be 
written as “The equation Ax = b has a unique solution for each b in M 7? .” This statement 
certainly implies (b) and hence implies that A is invertible. 

The next fact follows from Theorem 8 and Exercise 8 in Section 2.2. 


Let A and B be square matrices. If AB = /, then A and B are both invertible, 
with B = A~ 1 and A = B l . 
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The Invertible Matrix Theorem divides the set of all n x n matrices into two disjoint 
classes: the invertible (nonsingular) matrices, and the noninvertible (singular) matrices. 
Each statement in the theorem describes a property of every n x n invertible matrix. 
The negation of a statement in the theorem describes a property of every n x n singular 
matrix. For instance, an n x n singular matrix is not row equivalent to I n , does not have 
n pivot positions, and has linearly dependent columns. Negations of other statements 
are considered in the exercises. 


EXAMPLE 1 


Use the Invertible Matrix Theorem to decide if A is invertible: 


SOLUTION 




Expanded Table 
for the IMT 2-10 


So A has three pivot positions and hence is invertible, by the Invertible Matrix Theorem, 
statement (c). ■ 

The power of the Invertible Matrix Theorem lies in the connections it provides 
among so many important concepts, such as linear independence of columns of a matrix 
A and the existence of solutions to equations of the form Ax = b. It should be empha¬ 
sized, however, that the Invertible Matrix Theorem applies only to square matrices. For 
example, if the columns of a 4 x 3 matrix are linearly independent, we cannot use the 
Invertible Matrix Theorem to conclude anything about the existence or nonexistence of 
solutions to equations of the form Ax = b. 


Invertible Linear Transformations 

Recall from Section 2.1 that matrix multiplication corresponds to composition of linear 
transformations. When a matrix A is invertible, the equation A~ l Ax = x can be viewed 
as a statement about linear transformations. See Figure 2. 


Multiplication 



FIGURE 2 A 1 transforms Ax back to x. 


•Ax 


A linear transformation T : M 77 -> M 77 is said to be invertible if there exists a func¬ 


tion S : 



n 


* 



77 such that 


S(T (x)) = x 
T(S(x)) = x 


for all x in M 


n 


for all x in M 77 


( 1 ) 

( 2 ) 


The next theorem shows that if such an S exists, it is unique and must be a linear 
transformation. We call S the inverse of T and write it as T~ [ . 
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THEOREM 9 


WEB 


Let T : M 77 M 77 be a linear transformation and let A be the standard matrix for 

T. Then T is invertible if and only if A is an invertible matrix. In that case, the 
linear transformation S given by S(x) = v4 -1 x is the unique function satisfying 
equations (1) and (2). 


Remark: See the comment on the proof of Theorem 7. 

PROOF Suppose that T is invertible. Then (2) shows that T is onto M 77 , for if b is in 
M 77 and x = S (b), then 7"(x) = T (S(b)) = b, so each b is in the range of T . Thus A is 
invertible, by the Invertible Matrix Theorem, statement (i). 

Conversely, suppose that A is invertible, and let S (x) = v4 -1 x. Then, S is a linear 
transformation, and S obviously satisfies (1) and (2). For instance, 

S(T(x)) = S(v4x) = A -1 (Ax) = x 

Thus T is invertible. The proof that S is unique is outlined in Exercise 39. ■ 

EXAMPLE 2 What can you say about a one-to-one linear transformation T from 
M 77 into M 77 ? 


SOLUTION The columns of the standard matrix A of T are linearly independent (by 
Theorem 12 in Section 1.9). So A is invertible, by the Invertible Matrix Theorem, and 
T maps M 77 onto M 77 . Also, T is invertible, by Theorem 9. ■ 


NUMERICAL NOTES - 

In practical work, you might occasionally encounter a “nearly singular” or ill- 
conditioned matrix—an invertible matrix that can become singular if some of 
its entries are changed ever so slightly. In this case, row reduction may produce 
fewer than n pivot positions, as a result of roundoff error. Also, roundoff error 
can sometimes make a singular matrix appear to be invertible. 

Some matrix programs will compute a condition number for a square 
matrix. The larger the condition number, the closer the matrix is to being singular. 
The condition number of the identity matrix is 1. A singular matrix has an 
infinite condition number. In extreme cases, a matrix program may not be able to 
distinguish between a singular matrix and an ill-conditioned matrix. 

Exercises 41-45 show that matrix computations can produce substantial 
error when a condition number is large. 


PRACTICE PROBLEMS 


1. Determine if A 


2 3 4 
2 3 4 
2 3 4 


is invertible 


2. Suppose that for a certain n x n matrix A, statement (g) of the Invertible Matrix 
Theorem is not true. What can you say about equations of the form Ax = b? 

3. Suppose that A and B are n x n matrices and the equation ABx = 0 has a nontrivial 
solution. What can you say about the matrix AB1 
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2.3 EXERCISES 

Unless otherwise specified, assume that all matrices in these 
exercises are n x n . Determine which of the matrices in Exercises 
1-10 are invertible. Use as few calculations as possible. Justify 
your answers. 

6 " 

-9 

0 4" 

0 -1 
0 9 _ 

-5 -4" 

3 4 

6 °_ 

3 7 4" 

5 9 6 

0 2 8 

0 0 10 


In Exercises 11 and 12, the matrices are all n x n. Each part of 
the exercises is an implication of the form “If “statement 1”, 
then “statement 2”.” Mark an implication as True if the truth of 
“statement 2 ” always follows whenever “statement 1 ” happens 
to be true. An implication is False if there is an instance in 
which “statement 2” is false but “statement 1” is true. Justify each 
answer. 

11. a. If the equation Ax = 0 has only the trivial solution, then 

A is row equivalent to the n x n identity matrix. 

b. If the columns of A span R /7 , then the columns are linearly 
independent. 

c. If A is an n x n matrix, then the equation Ax = b has at 
least one solution for each b in R 77 . 

d. If the equation Ax = 0 has a nontrivial solution, then A 
has fewer than n pivot positions. 

e. If A T is not invertible, then A is not invertible. 

12. a. If there is an n x n matrix D such that AD = /, then there 

is also an n x n matrix C such that CA = I. 

b. If the columns of A are linearly independent, then the 
columns of A span R 77 . 

c. If the equation Ax = b has at least one solution for each 
b in R 77 , then the solution is unique for each b. 


1 


3 


5 


7. 


5 

3 

5 

-3 

8 

0 

1 

-4 


7 

-6 


0 

-7 

5 

3 

0 

-9 


1 

3 

2 

0 


3 

5 

6 
1 


9. [M] 


4 

-6 

7 


0 

0 

-1 

-5 

2 

7 


0 

8 

3 

2 


1 

3 

2 

1 


0 

1 

5 


-7 

11 

10 


10. [M] 


2 


4 


-4 

6 

-7 

3 


6 


1 

0 

-3 


8 . 


1 

0 

0 

0 


-7 

9 

19 


-1 


2 

3 

i 

5 

3 

1 

7 

9 

6 

4 

2 

8 

OO 

7 

5 

3 

10 

9 

9 

6 

4 

-9 

-5 

OO 

5 

2 

11 

4 





16. 

17. 

18. 


19. 

20 . 

21 . 

22 . 

23. 

24. 

25. 

26. 

27. 


28. 

29. 


d. If the linear transformation (x) i-> Ax maps R 77 into R 77 , 
then A has n pivot positions. 

e. If there is a b in R 77 such that the equation Ax = b is 
inconsistent, then the transformation x i-> Ax is not one- 
to-one. 

An m x n upper triangular matrix is one whose entries 
below the main diagonal are 0’s (as in Exercise 8 ). When 
is a square upper triangular matrix invertible? Justify your 
answer. 

An m x n lower triangular matrix is one whose entries 
above the main diagonal are 0’s (as in Exercise 3). When 
is a square lower triangular matrix invertible? Justify your 
answer. 


Can a square matrix with two identical columns be invert¬ 
ible? Why or why not? 

Is it possible for a 5 x 5 matrix to be invertible when its 
columns do not span R 5 ? Why or why not? 

If A is invertible, then the columns of A -1 are linearly 
independent. Explain why. 

If C is 6 x 6 and the equation C x = v is consistent for every 
v in R 6 , is it possible that for some v, the equation Cx = v 
has more than one solution? Why or why not? 


If the columns of a 7 x 7 matrix D are linearly independent, 
what can you say about solutions of Dx = b? Why? 


If n x n matrices E and F have the property that EF = I , 
then E and F commute. Explain why. 

If the equation Gx = y has more than one solution for some 
y in R 77 , can the columns of G span R 77 ? Why or why not? 


If the equation Hx = c is inconsistent for some c in R 77 , what 
can you say about the equation Hx = 0? Why? 


If an n x n matrix K cannot be row reduced to I n , what can 
you say about the columns of K1 Why? 

If L is n x n and the equation Lx = 0 has the trivial solution, 
do the columns of L span R" ? Why? 


Verify the boxed statement preceding Example 1. 


Explain why the columns of A 2 span R 77 whenever the 
columns of A are linearly independent. 


Show that if AB is invertible, so is A. You cannot use Theorem 
6 (b), because you cannot assume that A and B are invertible. 
[Hint: There is a matrix W such that ABW = I. Why?] 


Show that if AB is invertible, so is B. 


If A is an n x n matrix and the equation Ax =b has more than 
one solution for some b, then the transformation x i-^ Ax is 
not one-to-one. What else can you say about this transforma¬ 
tion? Justify your answer. 
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30. If A is an n x n matrix and the transformation x i-> Ax is 
one-to-one, what else can you say about this transformation? 
Justify your answer. 

31. Suppose A is an n x n matrix with the property that the 
equation Ax = b has at least one solution for each b in R 77 . 
Without using Theorems 5 or 8, explain why each equation 
Ax = b has in fact exactly one solution. 

32. Suppose A is an n xn matrix with the property that the equa¬ 
tion Ax = 0 has only the trivial solution. Without using the 
Invertible Matrix Theorem, explain directly why the equation 
Ax = b must have a solution for each b in R 77 . 

In Exercises 33 and 34, T is a linear transformation from R 2 into 

R 2 . Show that T is invertible and find a formula for T ~ l . 

33. T(x u x 2 ) = (—5xi + 9x 2 ,4*i — 7x 2 ) 

34. T(xi , x 2 ) = (6x\ — 8x 2 , — 5x\ + lx 2 ) 

35. Let T : IR ' 7 -> R 77 be an invertible linear transformation. Ex¬ 
plain why T is both one-to-one and onto R 77 . Use equations 

(1) and (2). Then give a second explanation using one or more 
theorems. 

36. Let T be a linear transformation that maps R 77 onto R 77 . Show 
that T~ l exists and maps R 77 onto R 77 . Is T~ l also one-to- 
one? 

37. Suppose T and U are linear transformations from R 77 to R 77 
such that T(Ux) = x for all x in R 77 . Is it true that U(Tx ) = x 
for all x in R 77 ? Why or why not? 

38. Suppose a linear transformation T : R 77 -> R 77 has the prop¬ 
erty that T(u) = T (v) for some pair of distinct vectors u and 
v in R 77 . Can T map R 77 onto R 77 ? Why or why not? 

39. Let T : R 77 -> R 77 be an invertible linear transformation, 
and let S and U be functions from R 77 into R 77 such that 
S (T (x)) = x and U (T (x)) = x for all x in R 77 . Show that 
U(\) = S(\) for all v in R 77 . This will show that T has a 
unique inverse, as asserted in Theorem 9. [Hint: Given any 
v in R 77 , we can write v = T(x) for some x. Why? Compute 
S (v) and £/(v).] 

40. Suppose T and S satisfy the invertibility equations (1) and 

(2) , where T is a linear transformation. Show directly that 
S is a linear transformation. [Hint: Given u, v in R 77 , let 
x = S( u), y = S(v). Then T (x) = u, T (y) = v. Why? Apply 
S to both sides of the equation T(x) + T( y) = T(x + y). 
Also, consider T(cx) = cT(x).] 


41. [M] Suppose an experiment leads to the following system of 
equations: 

4.5xi+ 3.1x 2 = 19.249 (3) 

1.6xi T l.lx 2 — 6.843 

a. Solve system (3), and then solve system (4), below, in 
which the data on the right have been rounded to two 
decimal places. In each case, find the exact solution. 

4.5xi + 3.1x 2 = 19.25 (4) 

1.6xi T l.lx 2 — 6.84 

b. The entries in (4) differ from those in (3) by less than 
.05%. Lind the percentage error when using the solution 
of (4) as an approximation for the solution of (3). 

c. Use your matrix program to produce the condition num¬ 
ber of the coefficient matrix in (3). 


Exercises 42-44 show how to use the condition number of a ma¬ 
trix A to estimate the accuracy of a computed solution of Ax = b. 
If the entries of A and b are accurate to about r significant digits 
and if the condition number of A is approximately 10 A (with k a 
positive integer), then the computed solution of Ax = b should 
usually be accurate to at least r — k significant digits. 


42. [M] Lind the condition number of the matrix A in Exercise 9. 
Construct a random vector x in R 4 and compute b = Ax. 
Then use your matrix program to compute the solution Xi 
of Ax = b. To how many digits do x and Xi agree? Lind out 
the number of digits your matrix program stores accurately, 
and report how many digits of accuracy are lost when Xi is 
used in place of the exact solution x. 


43. [M] Repeat Exercise 42 for the matrix in Exercise 10. 

44. [M] Solve an equation Ax = b for a suitable b to find the last 
column of the inverse of the fifth-order Hilbert matrix 



1 

1/2 

1/3 

1/4 

1/5 

1/2 

1/3 

1/4 

1/5 

1/6 

1/3 

1/4 

1/5 

1/6 

1/7 

1/4 

1/5 

1/6 

1/7 

1/8 

1/5 

1/6 

1/7 

1/8 

1/9 


How many digits in each entry of x do you expect to be 
correct? Explain. [Note: The exact solution is (630, —12600, 
56700,-88200,44100).] 


45. [M] Some matrix programs, such as MATLAB, have a com¬ 
mand to create Hilbert matrices of various sizes. If possible, 
use an inverse command to compute the inverse of a twelfth- 
order or larger Hilbert matrix, A. Compute AA~ l . Report 
what you find. 



Mastering: Reviewing and Reflecting 2-13 


SOLUTIONS TO PRACTICE PROBLEMS 

1. The columns of A are obviously linearly dependent because columns 2 and 3 are mul¬ 
tiples of column 1. Hence A cannot be invertible, by the Invertible Matrix 
Theorem. 
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2. If statement (g) is not true, then the equation Ax = b is inconsistent for at least one 
b in M /? . 

3. Apply the Invertible Matrix Theorem to the matrix AB in place of A . Then statement 
(d) becomes: ABx = 0 has only the trivial solution. This is not true. So AB is not 
invertible. 


4 PARTITIONED MATRICES 


A key feature of our work with matrices has been the ability to regard a matrix A as a list 
of column vectors rather than just a rectangular array of numbers. This point of view has 
been so useful that we wish to consider other partitions of A , indicated by horizontal 
and vertical dividing rules, as in Example 1 below. Partitioned matrices appear in most 
modern applications of linear algebra because the notation highlights essential struc¬ 
tures in matrix analysis, as in the chapter introductory example on aircraft design. This 
section provides an opportunity to review matrix algebra and use the Invertible Matrix 
Theorem. 



EXAMPLE 1 The matrix 



r 3 

0 

-1 

5 

9 

-2 

A = 

-5 

2 

4 

0 

-3 

1 


_ —8 

-6 

3 

1 

7 

-4 


can also be written as the 2 x 3 partitioned (or block) matrix 



whose entries are the blocks (or submatrices ) 


^413 

^23 



A 21 = [-8 -6 


3], A 22 = [\ 7], 


Al3 — i 

A 23 = [-4] ■ 


EXAMPLE 2 When a matrix A appears in a mathematical model of a physical 
system such as an electrical network, a transportation system, or a large corporation, 
it may be natural to regard A as a partitioned matrix. For instance, if a microcomputer 
circuit board consists mainly of three VLSI (very large-scale integrated) microchips, 
then the matrix for the circuit board might have the general form 



” An 

^412 

A 13 

A = 

^21 

^22 

^23 


_ A 31 

^32 

^33 


The submatrices on the “diagonal” of A —namely, An, A 22 , and A 33 — concern the three 
VLSI chips, while the other submatrices depend on the interconnections among those 
microchips. ■ 


Addition and Scalar Multiplication 

If matrices A and B are the same size and are partitioned in exactly the same way, 
then it is natural to make the same partition of the ordinary matrix sum A + B . In this 

































120 CHAPTER 2 Matrix Algebra 


case, each block of A + B is the (matrix) sum of the corresponding blocks of A and B . 
Multiplication of a partitioned matrix by a scalar is also computed block by block. 


Multiplication of Partitioned Matrices 

Partitioned matrices can be multiplied by the usual row-column rule as if the block 
entries were scalars, provided that for a product AB , the column partition of A matches 
the row partition of B . 


EXAMPLE 3 Let 




The 5 columns of A are partitioned into a set of 3 columns and then a set of 2 
columns. The 5 rows of B are partitioned in the same way —into a set of 3 rows and 
then a set of 2 rows. We say that the partitions of A and B are conformable for block 
multiplication. It can be shown that the ordinary product AB can be written as 











r-5 

4~| 

AB = 

An 

A 21 

A 12 

A22 _ 


Bi 

Bl 

— 

AuBi 
_ A21B1 

+ A12B2 
+ A22B2 

— 

-6 

2 





2 

1 _ 


It is important for each smaller product in the expression for AB to be written with 
the submatrix from A on the left, since matrix multiplication is not commutative. For 
instance, 


A\\B\ 


2 -3 


1 


1 5 -2 


6 

2 

3 


4 

1 

7 


15 

2 


12 

-5 


^ 12^2 


0 

3 


4 

1 



"-1 

3" 


" —20 

-8" 


5 

2 


-8 

7 


Hence the top block in AB is 


A\\B\ + Ai 2 B 2 


"15 

12 " 

+ 

'-20 

-8" 


"-5 

4" 

2 

-5 

-8 

7 


-6 

2 


The row-column rule for multiplication of block matrices provides the most general 
way to regard the product of two matrices. Each of the following views of a product 
has already been described using simple partitions of matrices: (1) the definition of Ax 
using the columns of A, (2) the column definition of AB, (3) the row-column rule for 
computing AB , and (4) the rows of AB as products of the rows of A and the matrix B . 
A fifth view of AB , again using partitions, follows in Theorem 10 below. 

The calculations in the next example prepare the way for Theorem 10. Here col^ (A) 
is the kt h column of A, and row k(B) is the kt h row of B . 


EXAMPLE 4 Let A = 


2 

5 


- 

a 

b 

and B = 

c 

d 

- 

e 

/. 


. Verify that 


AB = coli(v4) rowi(Z?) + co^A) row 2 (i?) + col3(v4) row 3 (i?) 
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SOLUTION Each term above is an outer product. (See Exercises 27 and 28 in Sec¬ 
tion 2.1.) By the row-column rule for computing a matrix product, 


Thus 


coli(v 4 ) rowi(^) = 

"- 3 " 

1 

[a b ] = 

—3a 

a 

1_1 

C0I2O4) row 2 (i?) = 

r 

-4 

[ c d ] = 

c 

—4c 

1 1 


col3(v4) row 3 (i?) = 




2 / 

5/ 


3 

colfc (A) rowfc (B) = 

k = 1 


— 3a c 2e 
a — 4c + 5e 


—3b + d + 2/ 
b — 4d + 5/ 


This matrix is obviously AB. Notice that the (1,1)-entry in AB is the sum of the (1,1)- 
entries in the three outer products, the (1,2)-entry in AB is the sum of the (1,2)-entries 
in the three outer products, and so on. ■ 


THEOREM 10 


Column-Row Expansion of AB 
If A ismxn and B is n x p , then 


AB = [coli(v 4 ) C0I2O4) 


col„ (A) ] 


rowi (B) 
row 2 (B) 


row n (B) 


( 1 ) 


coli (v4) rowi ( B ) + • • • + col„ (^4) row /7 ( B ) 


P R 0 01 For each row index i and column index j , the (/, j ) -entry in col^ {A) row^ ( B ) 
is the product of from col^(v4) and bkj from row^(^). Hence the (/, j )-entry in the 
sum shown in equation (1) is 



(k = 1 ) (k = 2 ) 


^ i n bnj 

(k = n) 


This sum is also the (/, j) -entry in AB , by the row-column rule. 


Inverses of Partitioned Matrices 

The next example illustrates calculations involving inverses and partitioned matrices. 


EXAMPLE 5 A matrix of the form 



Mi 

Mi 


is said to be block upper triangular. Assume that An is p x p, A 21 is q x q, and A is 
invertible. Find a formula for A~ l . 
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SOLUTION Denote A 1 by B and partition B so that 


- 1 

H—‘ 

X 

H—‘ 

to 

_ 1 


B\\ B\2 


1 — 

0 

1 _ 

0 A 22 


B 2 \ B 22 


- 1 

O 

_ 1 


( 2 ) 


This matrix equation provides four equations that will lead to the unknown blocks 
B\ i,..., B 22 • Compute the product on the left side of equation (2), and equate each entry 
with the corresponding block in the identity matrix on the right. That is, set 


A \\ B \\ + ^12^21 — / 

^ 4 11^12 + ^ 4 12^22 = 0 


p 


^ 22^21 

^ 22^22 


0 

I 


q 


( 3 ) 

(4) 

(5) 

( 6 ) 


By itself, equation (6) does not show that A 22 is invertible. However, since A 22 is 
square, the Invertible Matrix Theorem and (6) together show that A 22 is invertible and 
B 22 = A 22 • Next, left-multiply both sides of (5) by A^ and obtain 

-1 


B 


21 


^22 0 


0 


so that (3) simplifies to 


A\\B\\ +0 = 7 


p 


Since An is square, this shows that An is invertible and Bn 
results with (4) to find that 


v4 11 1 . Finally, use these 


Aw B 


11^12 


Aw B 


I2t>22 


A 12 A 22 and B 


12 


An ^12^22 


Thus 


A 


-1 


1 

H—^ 

^ 4 12 

-1 

r a~ i 

^ 41 /^ 412 ^ 22 * 

1 

O 

^22 


1 

0 

1 

T ^ 

(N 


A block diagonal matrix is a partitioned matrix with zero blocks off the main 
diagonal (of blocks). Such a matrix is invertible if and only if each block on the diagonal 
is invertible. See Exercises 13 and 14. 


NUMERICAL NOTES - 

1. When matrices are too large to fit in a computer’s high-speed memory, 
partitioning permits the computer to work with only two or three submatrices 
at a time. For instance, one linear programming research team simplified 
a problem by partitioning the matrix into 837 rows and 51 columns. The 
problem’s solution took about 4 minutes on a Cray supercomputer. 1 

2. Some high-speed computers, particularly those with vector pipeline architec¬ 
ture, perform matrix calculations more efficiently when the algorithms use 
partitioned matrices . 2 

3. Professional software for high-performance numerical linear algebra, such as 
LAPACK, makes intensive use of partitioned matrix calculations. 


1 The solution time doesn’t sound too impressive until you learn that each of the 51 block columns contained 
about 250,000 individual columns. The original problem had 837 equations and more than 12,750,000 
variables! Nearly 100 million of the more than 10 billion entries in the matrix were nonzero. See Robert E. 
Bixby et al., “Very Large-Scale Linear Programming: A Case Study in Combining Interior Point and 
Simplex Methods,” Operations Research , 40, no. 5 (1992): 885-897. 

2 The importance of block matrix algorithms for computer calculations is described in Matrix Computations , 
3rd ed., by Gene H. Golub and Charles L. van Loan (Baltimore: Johns Hopkins University Press, 1996). 
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The exercises that follow give practice with matrix algebra and illustrate typical 
calculations found in applications. 


PRACTICE PROBLEMS 


1. Show that 


/ 0 
A I 


is invertible and find its inverse 


2. Compute X T X , where X is partitioned as [X\ X 2 ] 


2.4 EXERCISES 


In Exercises 1-9, assume that the matrices are partitioned con¬ 
formably for block multiplication. Compute the products shown 
in Exercises 1-4. 




" I 

0" 

"A 

B~ 

2. 

E 

I 

C 

D 

"0 

r 

" w 

X " 

4. 

I 

0 

Y 

z 


E 

0 

I 

-X 


0 

F 



"A 

B~ 

—1 

C 

D 

O' 

r a 

B 


I 


C D 


In Exercises 5-8, find formulas for X, Y, and Z in terms of A, B, 
and C, and justify your calculations. In some cases, you may need 
to make assumptions about the size of a matrix in order to produce 
a formula. [Hint: Compute the product on the left, and set it equal 
to the right side.] 



A B 
C 0 


/ 0 
X Y 


0 / 
Z 0 



X 0 
Y Z 


A 0 
B C 


I 0 
0 I 






~ X 

0 

0 


Y 

0 

I 







A 

0 

B 


Z 

0 

I 


I 0 
0 / 



A B 
0 I 


0 


Y Z 
0 / 


7 0 0 

0 0 I 


In Exercises 11 and 12, mark each statement True or False. Justify 
each answer. 


11. a. If A = [Ai A 2 \ and B = [B 1 B 2 \, with Ai and A 2 

the same sizes as B x and B 2 , respectively, then A + B = 
[A\ + B\ A 2 + B 2 \. 


12 


b. If A = 


An 

A 2 i 


Ai 2 

a 22 


and B = 


Bi 

B 2 


, then the partitions 


of A and B are conformable for block multiplication. 

a. The definition of the matrix-vector product Ax is a special 
case of block multiplication. 


b. If Ai, A 2 , B 1 , and B 2 are n x n matrices, A = 


A 1 

A 2 


, and 


B = [B 1 B 2 ], then the product BA is defined, but AB is 

not. 


13. Let A = 


B 0 
0 C 


, where B and C are square. Show that A 


is invertible if and only if both B and C are invertible. 


14. Show that the block upper triangular matrix A in Example 5 is 
invertible if and only if both An and A 22 are invertible. [Hint: 
If An and A 22 are invertible, the formula for A -1 given in 
Example 5 actually works as the inverse of A.] This fact about 
A is an important part of several computer algorithms that 
estimate eigenvalues of matrices. Eigenvalues are discussed 
in Chapter 5. 


9. Suppose An is an invertible matrix. Find matrices X and Y 
such that the product below has the form indicated. Also, 
compute B 22 . [Hint: Compute the product on the left, and set 
it equal to the right side.] 


-1 

0 

0" 


'A n 

A 12 


~Bu 

B 12 

X 

I 

0 


A 21 

A 22 

— 

0 

b 22 

Y 

0 

I 


_ A 3 I 

A 32 _ 


0 

b 22 _ 



" I 

0 

0" 


" I 

0 

0" 

10. The inverse of 

C 

I 

0 

is 

Z 

I 

0 


A 

B 

I 


X 

Y 

I 


Find X, Y, and Z. 


15. Suppose An is invertible. Find X and Y such that 


An 

A 2 \ 


A12 
A 22 


I 0 
X I 


An 0 

0 s 


I Y 
0 I 


(7) 


where S = A 2 2 — A 2 i A^ 1 An.. The matrix S is called the 
Schur complement of An. Likewise, if A 2 2 is invertible, 
the matrix An — Ai 2 A^ 2 A 2 i is called the Schur complement 
of A 22 • Such expressions occur frequently in the theory of 
systems engineering, and elsewhere. 


16. Suppose the block matrix A on the left side of (7) is invertible 
and An is invertible. Show that the Schur complement S of 
An is invertible. [Hint: The outside factors on the right side 
of (7) are always invertible. Verify this.] When A and An are 
both invertible, (7) leads to a formula for A -1 , using *S _1 , 
Afi 1 , and the other entries in A. 
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17. When a deep space probe is launched, corrections may 
be necessary to place the probe on a precisely calculated 
trajectory. Radio telemetry provides a stream of vectors, 
xj,..., Xfc, giving information at different times about how 
the probe’s position compares with its planned trajectory. 
Let X k be the matrix [xi • • • x k ]. The matrix G k = X k X[ is 
computed as the radar data are analyzed. When x^+i arrives, 
a new G k +\ must be computed. Since the data vectors arrive 
at high speed, the computational burden could be severe. 
But partitioned matrix multiplication helps tremendously. 
Compute the column-row expansions of G k and Gk+i, and 
describe what must be computed in order to update G k to 
form G k+l . 



The probe Galileo was launched October 18, 
1989, and arrived near Jupiter in early 
December 1995. 


18. Let X be an m x n data matrix such that X T X is invertible, 
and let M = I m — X(X T X)~ l X T . Add a column x 0 to the 
data and form 

W = [X x 0 ] 

Compute W T W. The (1, l)-entry is X T X. Show that the 
Schur complement (Exercise 15) of X T X can be written in 
the form XqMx 0 . It can be shown that the quantity 
(XqMx 0 ) -1 is the (2, 2)-entry in (W T W)~ l . This en¬ 
try has a useful statistical interpretation, under appropriate 
hypotheses. 


In the study of engineering control of physical systems, a standard 
set of differential equations is transformed by Laplace transforms 
into the following system of linear equations: 


— 1 

B " 


X 


0" 

C 

fn 


u 


_y_ 



where Aisn x n, B is n x m, C ism xn, and s is a variable. The 
vector u in IR m is the “input” to the system, y in R m is the “output,” 
and x in R n is the “state” vector. (Actually, the vectors x, u, and 
y are functions of 5, but we suppress this fact because it does not 
affect the algebraic calculations in Exercises 19 and 20.) 

19. Assume A — sl n is invertible and view (8) as a system of two 
matrix equations. Solve the top equation for x and substitute 


into the bottom equation. The result is an equation of the 
form IL(s)u = y, where W(s) is a matrix that depends on 
s. W(s) is called the transfer function of the system because 
it transforms the input u into the output y. Find W(s) and 
describe how it is related to the partitioned system matrix on 
the left side of (8). See Exercise 15. 


20. Suppose the transfer function W(s) in Exercise 19 is invert¬ 
ible for some s. It can be shown that the inverse transfer 
function W(s)~ l , which transforms outputs into inputs, is the 
Schur complement of A — BC — sl n for the matrix below. 
Find this Schur complement. See Exercise 15. 

A — BC — sl n B 
—C I 

^ 1 m 

21. a. Verify that A 2 = I when A = 
b. Use partitioned matrices to show that M 2 = I when 


1 0 
3 -1 



0 

0 

0 

1 


22. Generalize the idea of Exercise 21(a) [not 21(b)] by con- 

A 0 

structing a 5 x 5 matrix M = 


such that M 2 — I. 


C D 

Make C a nonzero 2x3 matrix. Show that your construction 
works. 


23. Use partitioned matrices to prove by induction that the prod¬ 
uct of two lower triangular matrices is also lower triangular. 
[Hint: A (k + 1) x (k T 1) matrix A 1 can be written in the 
form below, where a is a scalar, v is in R k , and A is a k x k 
lower triangular matrix. See the Study Guide for help with 
induction.] 

[ a 0 r 

A l = A 

v A 

24. Use partitioned matrices to prove by induction that for 

n = 2,3,..., the n x n matrix A shown below is invertible 
and B is its inverse. 



0" 

0 

0 


1 


0 

0 

1 


0 " 

0 

0 


L 0 ... -1 1J 

For the induction step, assume A and B are 
(k + 1) x fk T 1) matrices, and partition A and B in a form 
similar to that displayed in Exercise 23. 
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25. Without using row reduction, find the inverse of 



0 

0 

0 

8 

6 


26. [M] For block operations, it may be necessary to access or 
enter submatrices of a large matrix. Describe the functions 
or commands of your matrix program that accomplish the 
following tasks. Suppose A is a 20 x 30 matrix. 


a. Display the submatrix of A from rows 15 to 20 and 
columns 5 to 10. 


b. Insert a 5 x 10 matrix B into A, beginning at row 10 and 
column 20. 


c. Create a 50 x 50 matrix of the form B 


A 0 
0 A T 


[Note: It may not be necessary to specify the zero blocks 
in B .] 

27. [M] Suppose memory or size restrictions prevent your matrix 
program from working with matrices having more than 32 
rows and 32 columns, and suppose some project involves 
50 x 50 matrices A and B . Describe the commands or oper¬ 
ations of your matrix program that accomplish the following 
tasks. 

a. Compute A + B. 

b. Compute AB. 

c. Solve Ax = b for some vector b in IR 50 , assuming that 
A can be partitioned into a 2 x 2 block matrix [A/,-], 
with An an invertible 20 x 20 matrix, A 22 an invertible 
30 x 30 matrix, and Ai 2 a zero matrix. [Hint: Describe 
appropriate smaller systems to solve, without using any 
matrix inverses.] 


SOLUTIONS TO PRACTICE PROBLEMS 


1. If 


I 0 
A / 


is invertible, its inverse has the form 


W X 
Y Z 


. Verify that 


I 

A 



X 

Z 


W 

AW + Y 


X 

AX + Z 


So W,X,Y, and Z must satisfy W = /, X = 0, AW + Y = 0, and AX + Z = /. 
It follows that Y — —A and Z — I. Hence 


—1 

0 
_1 


1 

O 

1 _ 


1 

O 

1 _ 

A / 


1 

_1 


1 

O 

1 _ 


The product in the reverse order is also 
ible, and its inverse is ^ ^ 


Theorem.) 


A I 


. (You 


the identity, so the block matrix is invert- 
could also appeal to the Invertible Matrix 



X T X 


x[ 


x, x 2 


XfX, X! X 2 


T 


x\ X, 


XI 


x 2 


. The partitions of X T and X 


automatically conformable for block multiplication because the columns of X T 
the rows of X . This partition of X T X is used in several computer algorithms 
matrix computations. 


are 

are 

for 


2.5 MATRIX FACTORIZATIONS 


A factorization of a matrix A is an equation that expresses A as a product of two or more 
matrices. Whereas matrix multiplication involves a synthesis of data (combining the 
effects of two or more linear transformations into a single matrix), matrix factorization 
is an analysis of data. In the language of computer science, the expression of A as a 
product amounts to a preprocessing of the data in A, organizing that data into two or 
more parts whose structures are more useful in some way, perhaps more accessible for 
computation. 
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CHAPTER 2 


Matrix Algebra 


Matrix factorizations and, later, factorizations of linear transformations will appear 
at a number of key points throughout the text. This section focuses on a factorization 
that lies at the heart of several important computer programs widely used in applica¬ 
tions, such as the airflow problem described in the chapter introduction. Several other 
factorizations, to be studied later, are introduced in the exercises. 


The LU Factorization 


The LU factorization, described below, is motivated by the fairly common industrial 
and business problem of solving a sequence of equations, all with the same coefficient 
matrix: 

Ax = bi, Ax = b2, Ax = b p ( 1 ) 


See Exercise 32, for example. Also see Section 5.8, where the inverse power method 
is used to estimate eigenvalues of a matrix by solving equations like those in sequence 
(1), one at a time. 

When A is invertible, one could compute A~ l and then compute v4 -1 bi, A _1 b 2 , 
and so on. However, it is more efficient to solve the first equation in sequence (1) by 
row reduction and obtain an LU factorization of A at the same time. Thereafter, the 
remaining equations in sequence (1) are solved with the LU factorization. 

At first, assume that A is an m x n matrix that can be row reduced to echelon form, 
without row interchanges . (Later, we will treat the general case.) Then A can be written 
in the form A = LU , where L is an m x m lower triangular matrix with l’s on the 
diagonal and U is an m x n echelon form of A. Lor instance, see Ligure 1. Such a 
factorization is called an LU factorization of A . The matrix L is invertible and is called 
a unit lower triangular matrix. 
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FIGURE 1 An LU factorization. 


Before studying how to construct L and U, we should look at why they are so 
useful. When A = LU , the equation Ax = b can be written as L(Ux) = b. Writing y 
for Ux, we can find x by solving the pair of equations 


Ly = 

= b 

Ux = 

= y 



Lirst solve Ly = b for y, and then solve Ux = y for x. See Ligure 2. Each equation is 
easy to solve because L and U are triangular. 


Multiplication 




FIGURE 2 Factorization of the mapping x f-> Ax. 
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EXAMPLE 1 It can be verified that 
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Use this LU factorization of A to solve Ax = b, where b 


-9 

5 

7 

11 


SOLUTION The solution of Ly = b needs only 6 multiplications and 6 additions, be¬ 
cause the arithmetic takes place only in column 5. (The zeros below each pivot in L are 
created automatically by the choice of row operations.) 





Then, for Ux = y, the “backward” phase of row reduction requires 4 divisions, 6 mul¬ 
tiplications, and 6 additions. (For instance, creating the zeros in column 4 of [U y ] 
requires 1 division in row 4 and 3 multiplication-addition pairs to add multiples of row 4 
to the rows above.) 




To find x requires 28 arithmetic operations, or “flops” (floating point operations), 
excluding the cost of finding L and U . In contrast, row reduction of [ A b ] to [ I x ] 
takes 62 operations. ■ 


The computational efficiency of the LU factorization depends on knowing L and U . 
The next algorithm shows that the row reduction of A to an echelon form U amounts to 
an LU factorization because it produces L with essentially no extra work. After the first 
row reduction, L and U are available for solving additional equations whose coefficient 
matrix is A . 


An LU Factorization Algorithm 

Suppose A can be reduced to an echelon form U using only row replacements that add a 
multiple of one row to another row below it. In this case, there exist unit lower triangular 
elementary matrices E\ ,..., E p such that 

E p - -- E\A — U (3) 

Then 

A = {E p ■ • • Ei)~ l U = LU 
L = {E p ■ ■ ■ Ei)~ { 


where 


(4) 
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It can be shown that products and inverses of unit lower triangular matrices are also unit 
lower triangular. (For instance, see Exercise 19.) Thus L is unit lower triangular. 

Note that the row operations in equation (3), which reduce A to U, also reduce 
the L in equation (4) to I , because E p - • • E\L = (E p • • • E\)(E p • • • E\)~ { = I. This 
observation is the key to constructing L . 


ALGORITHM FOR AN LU FACTORIZATION 

1. Reduce A to an echelon form U by a sequence of row replacement operations, 
if possible. 

2. Place entries in L such that the same sequence of row operations reduces E 
to I. 


Step 1 is not always possible, but when it is, the argument above shows that an LU 
factorization exists. Example 2 will show how to implement step 2. By construction, L 
will satisfy 

(E p • • • E\)L = / 

using the same E \,..., E p as in equation (3). Thus L will be invertible, by the Invertible 
Matrix Theorem, with (E p ••• E\) = L~ l . From (3), L~ l A = U, and A = LU . So 
step 2 will produce an acceptable L . 


EXAM PLE 2 Find an LU factorization of 

2 4-1 5 -2" 

-4-5 3-8 1 

A ~ 2-5-4 1 8 

_ —6 0 7 -3 1 _ 

SOLUTION Since A has four rows, L should be 4 x 4. The first column of L is the first 
column of A divided by the top pivot entry: 

~ 1 0 0 0 " 

-2 t ° ° 

L ~ 1 10 

-3 1 


Compare the first columns of A and L. The row operations that create zeros in the 
first column of A will also create zeros in the first column of L. To make this same 
correspondence of row operations on A hold for the rest of L , watch a row reduction 
of A to an echelon form U . That is, highlight the entries in each matrix that are used to 
determine the sequence of row operations that transform A into U . [See the highlighted 
entries in equation (5).] 
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Permuted LU 
Factorizations 2-23 


The highlighted entries on page 128 determine the row reduction of A to U. At each 
pivot column, divide the highlighted entries by the pivot and place the result into L: 





i i 


1 


10 0 0 

-2 1 

1 -3 1 

, and L = 

-2100 

1-310 

-3 4 2 1 


-3 4 2 1 


An easy calculation verifies that this L and U satisfy LU = A. ■ 


In practical work, row interchanges are nearly always needed, because partial piv¬ 
oting is used for high accuracy. (Recall that this procedure selects, among the possible 
choices for a pivot, an entry in the column having the largest absolute value.) To handle 
row interchanges, the LU factorization above can be modified easily to produce an L 
that is permuted lower triangular , in the sense that a rearrangement (called a permu¬ 
tation) of the rows of L can make L (unit) lower triangular. The resulting permuted 
LU factorization solves Ax = b in the same way as before, except that the reduction of 
[L b ] to [ I y ] follows the order of the pivots in L from left to right, starting with 
the pivot in the first column. A reference to an “LU factorization” usually includes the 
possibility that L might be permuted lower triangular. For details, see the Study Guide. 


NUMERICAL NOTES - 

The following operation counts apply to an n x n dense matrix A (with most 

entries nonzero) for n moderately large, say, n > 30. 1 

1. Computing an LU factorization of A takes about 2n 3 / 3 flops (about the same 
as row reducing [ A b ]), whereas finding A -1 requires about 2 n 3 flops. 

2. Solving Ly = b and Ux = y requires about 2 n 2 flops, because any n x n 
triangular system can be solved in about n 2 flops. 

3. Multiplication of b by A -1 also requires about 2n 2 flops, but the result may 
not be as accurate as that obtained from L and U (because of roundoff error 
when computing both A -1 and A _1 b). 

4. If A is sparse (with mostly zero entries), then L and U may be sparse, too, 


LU factorization is much faster than using A . See Exercise 31. 



whereas A 1 is likely to be dense. In this case, a solution of Ax = b with an 




A Matrix Factorization in Electrical Engineering 

Matrix factorization is intimately related to the problem of constructing an electrical 
network with specified properties. The following discussion gives just a glimpse of the 
connection between factorization and circuit design. 


1 See Section 3.8 in Applied Linear Algebra, 3rd ed., by Ben Noble and James W. Daniel (Englewood Cliffs, 

NJ: Prentice-Hall, 1988). Recall that for our purposes, a flop is +, —, x, or -P. 
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Suppose the box in Figure 3 represents some sort of electric circuit, with an input 


and output. Record the input voltage and current by 


v\ 

h 


(with voltage v in volts and 


current i in amps), and record the output voltage and current by 


v 2 

h 


. Frequently, the 


transformation 


matrix , such that 


v\ 

h 


i->- 


v 2 

h 


is linear. That is, there is a matrix A , called the transfer 


v 2 

h 
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FIGURE 3 A circuit with input and output 
terminals. 


Figure 4 shows a ladder network , where two circuits (there could be more) are 
connected in series, so that the output of one circuit becomes the input of the next circuit. 
The left circuit in Figure 4 is called a series circuit , with resistance R\ (in ohms). 



A series circuit A shunt circuit 


FIG U R E 4 A ladder network. 


The right circuit in Figure 4 is a shunt circuit , with resistance R 2 . Using Ohm’s law and 
Kirchhoff’s laws, one can show that the transfer matrices of the series and shunt circuits, 


respectively, are 


1 - R ! 

0 1 

Transfer matrix 
of series circuit 



1 0 

_-\/r 2 i 

Transfer matrix 
of shunt circuit 


EXAMPLE 3 


a. Compute the transfer matrix of the ladder network in Figure 4. 


b. Design a ladder network whose transfer matrix is 



SOLUTION 




Let A\ and A 2 be the transfer matrices of the series and shunt circuits, respectively. 
Then an input vector x is transformed first into A \ x and then into A 2 (A \ x). The series 
connection of the circuits corresponds to composition of linear transformations, and 
the transfer matrix of the ladder network is (note the order) 


A 2 A\ 




-R i 

1 + Ri/R 2 
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To factor the matrix 




into the product of transfer matrices, as in equa¬ 


tion (6), look for R\ and R 2 in Figure 4 to satisfy 


1 -R { 

_-l/R 2 I + R 1 /R 2 

From the (1,2)-entries, R\ = 8 ohms, and from the (2,1)-entries, I/R 2 = .5 ohm 
and R 2 = 1/.5 = 2 ohms. With these values, the network in Figure 4 has the desired 
transfer matrix. ■ 



A network transfer matrix summarizes the input-output behavior (the design spec¬ 
ifications) of the network without reference to the interior circuits. To physically build 
a network with specified properties, an engineer first determines if such a network 
can be constructed (or realized). Then the engineer tries to factor the transfer matrix 
into matrices corresponding to smaller circuits that perhaps are already manufactured 
and ready for assembly. In the common case of alternating current, the entries in the 
transfer matrix are usually rational complex-valued functions. (See Exercises 19 and 20 
in Section 2.4 and Example 2 in Section 3.3.) A standard problem is to find a minimal 
realization that uses the smallest number of electrical components. 


PRACTICE PROBLEM 


Find an LU factorization of A = 



lNote: It will turn out that A 


has only three pivot columns, so the method of Example 2 will produce only the first 
three columns of L . The remaining two columns of L come from Is .] 


2.5 EXERCISES 


In Exercises 1-6, solve the equation Ax = b by using the LU 
factorization given for A. In Exercises 1 and 2, also solve Ax = b 
by ordinary row reduction. 
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Find an LU factorization of the matrices in Exercises 7-16 (with 
L unit lower triangular). Note that MATLAB will usually produce 
a permuted LU factorization because it uses partial pivoting for 
numerical accuracy. 


7. 

9. 

11 . 
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-5 6 
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17. When A is invertible, MATLAB finds A -1 by factoring A = 
L U (where L may be permuted lower triangular), inverting 
L and U , and then computing U~ l L ~ l . Use this method to 
compute the inverse of A in Exercise 2. (Apply the algorithm 
of Section 2.2 to L and to U.) 


18. Lind A 1 as in Exercise 17, using A from Exercise 3. 


19. Let A be a lower triangular n x n matrix with nonzero entries 
on the diagonal. Show that A is invertible and A -1 is lower 
triangular. [Hint: Explain why A can be changed into I using 
only row replacements and scaling. (Where are the pivots?) 
Also, explain why the row operations that reduce A to I 
change I into a lower triangular matrix.] 


20. Let A — LU be an LU factorization. Explain why A can be 
row reduced to U using only replacement operations. (This 
fact is the converse of what was proved in the text.) 


21. Suppose A = BC, where B is invertible. Show that any 
sequence of row operations that reduces B to I also reduces 
A to C. The converse is not true, since the zero matrix may 
be factored as 0 = B • 0. 


22. {Reduced LU Factorization ) With A as in the Practice Prob¬ 
lem, find a 5 x 3 matrix B and a 3 x 4 matrix C such that 
A = BC. Generalize this idea to the case where A ism x n, 
A = LU, and U has only three nonzero rows. 

23. {Rank Factorization) Suppose an m x n matrix A admits a 
factorization A = CD where C is m x 4 and D is 4 x n. 

a. Show that A is the sum of four outer products. (See 
Section 2.4.) 

b. Let m = 400 and n = 100. Explain why a computer 
programmer might prefer to store the data from A in the 
form of two matrices C and D. 


24. {QR Factorization ) Suppose A = QR, where Q and R are 
n xn, R is invertible and upper triangular, and Q has the 
property that Q T Q = /. Show that for each b in R n , the 
equation Ax = b has a unique solution. What computations 
with Q and R will produce the solution? 


WEB 


25. {Singular Value Decomposition ) Suppose A = UDV T , 
where U and V are n x n matrices with the property that 
U T U — I and V T V = I, and where D is a diagonal matrix 
with positive numbers a,a n on the diagonal. Show that 
A is invertible, and find a formula for A -1 . 


26. {Spectral Factorization ) Suppose a 3 x 3 matrix A admits a 
factorization as A = PDP~ l , where P is some invertible 
3x3 matrix and D is the diagonal matrix 





Show that this factorization is useful when computing high 
powers of A. Lind fairly simple formulas for A 2 , A 3 , and A k 
{k a positive integer), using P and the entries in D. 


27. Design two different ladder networks that each output 9 volts 
and 4 amps when the input is 12 volts and 6 amps. 


28. Show that if three shunt circuits (with resistances R [, R 2 , R 3 ) 
are connected in series, the resulting network has the same 
transfer matrix as a single shunt circuit. Lind a formula for 
the resistance in that circuit. 


29. a. Compute the transfer matrix of the network in the figure. 


b. Let A = 


. Design a ladder network 


4/3 -12 

— 1/4 3 

whose transfer matrix is A by finding a suitable matrix 
factorization of A. 



Exercises 22-26 provide a glimpse of some widely used matrix 
factorizations, some of which are discussed later in the text. 
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30. Find a different factorization of the A in Exercise 29, and 
thereby design a different ladder network whose transfer 
matrix is A. 



[M] The solution to the steady-state heat flow problem for 
the plate in the figure is approximated by the solution to the 
equation Ax = b, where b = (5,15,0,10,0,10,20, 30) and 
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(Refer to Exercise 33 of Section 1.1.) The missing entries in 
A are zeros. The nonzero entries of A lie within a band along 
the main diagonal. Such band matrices occur in a variety of 
applications and often are extremely large (with thousands of 
rows and columns but relatively narrow bands). 

a. Use the method of Example 2 to construct an LU factor¬ 
ization of A, and note that both factors are band matrices 
(with two nonzero diagonals below or above the main 
diagonal). Compute LU — A to check your work. 

b. Use the LU factorization to solve Ax = b. 


c. Obtain A -1 and note that A~ l is a dense matrix with no 
band structure. When A is large, L and U can be stored in 
much less space than A -1 . This fact is another reason for 
preferring the LU factorization of A to A -1 itself. 

32. [M] The band matrix A shown below can be used to estimate 
the unsteady conduction of heat in a rod when the tempera¬ 
tures at points p \,..., p 5 on the rod change with time. 2 * 

Ax Ax 

€ — - ) 

Pi Pi Ps Pa P 5 


The constant C in the matrix depends on the physical nature 
of the rod, the distance Ax between the points on the rod, 
and the length of time At between successive temperature 
measurements. Suppose that for k = 0,1,2,..., a vector U 
in M 5 lists the temperatures at time k At. If the two ends of the 
rod are maintained at 0°, then the temperature vectors satisfy 
the equation At^+i = tk(k = 0,1,...), where 



(1+2C) -C 

-C (1+2 C) -C 

-C (1+2 C) -C 

-C (1+2C) -C 

-C (1+2 C) 


a. Find the LU factorization of A when C = 1. A matrix 
such as A with three nonzero diagonals is called a tridiag¬ 
onal matrix. The L and U factors are bidiagonal matrices. 

b. Suppose C — 1 and t 0 = (10,12,12,12,10). Use the 
LU factorization of A to find the temperature distributions 
ti, X, 2 , t 3 , and . 


2 See Biswa N. Datta, Numerical Linear Algebra and Applications (Pacific 

Grove, CA: Brooks/Cole, 1994), pp. 200-201. 


SOLUTION TO PRACTICE PROBLEM 



Divide the entries in each highlighted column by the pivot at the top. The resulting 
columns form the first three columns in the lower half of L. This suffices to make row 
reduction of L to / correspond to reduction of A to U. Use the last two columns of Is 
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to make L unit lower triangular. 
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2.6 THE LEONTIEF INPUT-OUTPUT MODEL 



Linear algebra played an essential role in the Nobel prize-winning work of Wassily 
Leontief, as mentioned at the beginning of Chapter 1. The economic model described 
in this section is the basis for more elaborate models used in many parts of the world. 

Suppose a nation’s economy is divided into n sectors that produce goods or services, 
and let x be a production vector in M 7? that lists the output of each sector for one 
year. Also, suppose another part of the economy (called the open sector ) does not 
produce goods or services but only consumes them, and let d be a final demand vector 
(or bill of final demands) that lists the values of the goods and services demanded 
from the various sectors by the nonproductive part of the economy. The vector d can 
represent consumer demand, government consumption, surplus production, exports, or 
other external demands. 

As the various sectors produce goods to meet consumer demand, the producers 
themselves create additional intermediate demand for goods they need as inputs for 
their own production. The interrelations between the sectors are very complex, and the 
connection between the final demand and the production is unclear. Leontief asked if 
there is a production level x such that the amounts produced (or “supplied”) will exactly 
balance the total demand for that production, so that 


amount 
produced 
x 


(intermediate) f , 

< . t > + {demand 

( demand ) 


{ d 


( 1 ) 


The basic assumption of Leontief’s input-output model is that for each sector, there 
is a unit consumption vector in M ;? that lists the inputs needed per unit of output of 
the sector. All input and output units are measured in millions of dollars, rather than in 
quantities such as tons or bushels. (Prices of goods and services are held constant.) 

As a simple example, suppose the economy consists of three sectors — manufac¬ 
turing, agriculture, and services — with unit consumption vectors Ci, C 2 , and C 3 , as shown 
in the table that follows. 
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Inputs Consumed per Unit of Output 

Purchased from: 

Manufacturing 

Agriculture 

Services 

Manufacturing 

.50 

.40 

.20 

Agriculture 

.20 

.30 

.10 

Services 

.10 

.10 

.30 


t 

t 

t 


Cl 

c 2 

c 3 


EXAMPLE 1 What amounts will be consumed by the manufacturing sector if it 
decides to produce 100 units? 


SOLUTION Compute 


100c 


1 


100 


.50 


50 

.20 

— 

20 

.10 


10 


To produce 100 units, manufacturing will order (i.e., “demand”) and consume 50 units 
from other parts of the manufacturing sector, 20 units from agriculture, and 10 units 
from services. ■ 

If manufacturing decides to produce X\ units of output, then X]C\ represents the 
intermediate demands of manufacturing, because the amounts in x\C\ will be consumed 
in the process of creating the X\ units of output. Likewise, if x 2 and X 3 denote the planned 
outputs of the agriculture and services sectors, X 2 C 2 and X3C3 list their corresponding 
intermediate demands. The total intermediate demand from all three sectors is given by 


{intermediate demand} = X\C\ + X2C2 + X3C3 

= Cx 


( 2 ) 


where C is the consumption matrix [ ci C2 C3 ], namely, 


C 


.50 

.40 

.20 

.20 

.30 

.10 

.10 

.10 

.30 


( 3 ) 


Equations (1) and (2) yield Leontief’s model. 


THE LEONTIEF INPUT-OUTPUT MODEL, OR PRODUCTION EQUATION 


x = 

= Cx + d 

(4) 

Amount 

Intermediate Final 


produced 

demand demand 



Equation (4) may also be written as lx — Cx = d,or 

(/ - C)x = d (5) 

EXAMPLE 2 Consider the economy whose consumption matrix is given by (3). 
Suppose the final demand is 50 units for manufacturing, 30 units for agriculture, and 
20 units for services. Find the production level x that will satisfy this demand. 
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SOLUTION The coefficient matrix in (5) is 
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.7 ■ 

-.1 
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.1 

.1 

.3 


-.1 

-.1 

.7 


To solve (5), row reduce the augmented matrix 


.5 ■ 

-.4 ■ 

-.2 

50" 


5 

-4 

-2 

500" 


"1 

0 

0 

226" 

-.2 

.7 ■ 

-.1 

30 


-2 

7 

-1 

300 


0 

1 

0 

119 

-.1 

-.1 

.7 

20 


-1 

-1 

7 

200 


0 

0 

1 

78 


The last column is rounded to the nearest whole unit. Manufacturing must produce 
approximately 226 units, agriculture 119 units, and services only 7 8 units. ■ 

If the matrix I — C is invertible, then we can apply Theorem 5 in Section 2.2, with 
A replaced by (/ — C), and from the equation (/ — C)x = d obtain x = (/ — C) -1 d. 
The theorem below shows that in most practical cases, I — C is invertible and the 
production vector x is economically feasible, in the sense that the entries in x are non¬ 
negative. 

In the theorem, the term column sum denotes the sum of the entries in a column 
of a matrix. Under ordinary circumstances, the column sums of a consumption matrix 
are less than 1 because a sector should require less than one unit’s worth of inputs to 
produce one unit of output. 


THEOREM 11 Let C be the consumption matrix for an economy, and let d be the final demand. 

If C and d have nonnegative entries and if each column sum of C is less than 1, 
then (/ — C) _1 exists and the production vector 

X = (/ -C) _1 d 

has nonnegative entries and is the unique solution of 

x = Cx + d 


The following discussion will suggest why the theorem is true and will lead to a 
new way to compute (/ — C) _1 . 

A Formula for (I - CJ -1 

Imagine that the demand represented by d is presented to the various industries at the 
beginning of the year, and the industries respond by setting their production levels at 
x = d, which will exactly meet the final demand. As the industries prepare to produce d, 
they send out orders for their raw materials and other inputs. This creates an intermediate 
demand of Cd for inputs. 

To meet the additional demand of Cd, the industries will need as additional inputs 
the amounts in C(Cd) = C 2 d. Of course, this creates a second round of intermediate 
demand, and when the industries decide to produce even more to meet this new demand, 
they create a third round of demand, namely, C(C 2 d) = C 3 d. And so it goes. 

Theoretically, this process could continue indefinitely, although in real life it would 
not take place in such a rigid sequence of events. We can diagram this hypothetical 
situation as follows: 
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Demand That 
Must Be Met 

Inputs Needed to 
Meet This Demand 

Final demand 

d 

Cd 

Intermediate demand 



1 st round 

Cd 

C(Cd) = C 2 d 

2nd round 

C 2 d 

C(C 2 d) = C 3 d 

3rd round 

C 3 d 

• 

• 

• 

C(C 3 d) = C 4 d 

• 

• 

• 


The production level x that will meet all of this demand is 

x = d + Cd + C 2 d + C 3 d + ••• 

= (/+C+C 2 + C 3 + ---)d (6) 

To make sense of equation (6), consider the following algebraic identity: 

(/ - C)(7 + C + C 2 + ■■■ + C m ) = I - C' n+l (7) 

It can be shown that if the column sums in C are all strictly less than 1, then I — C is in¬ 
vertible, C m approaches the zero matrix as m gets arbitrarily large, and I — C 777 + 1 -> I. 
(This fact is analogous to the fact that if a positive number t is less than 1, then t m -> 0 
as m increases.) Using equation (7), write 

(/ - C)- 1 ^ / + c + c 2 + c 3 + • • • + c m 

(8) 

when the column sums of C are less than 1. 

The approximation in (8) means that the right side can be made as close to (/ — C)~ l 
as desired by taking m sufficiently large. 

In actual input-output models, powers of the consumption matrix approach the zero 
matrix rather quickly. So (8) really provides a practical way to compute (/ — C) _1 . 
Likewise, for any d, the vectors C 777 d approach the zero vector quickly, and (6) is a 
practical way to solve (/ — C)x = d. If the entries in C and d are nonnegative, then (6) 
shows that the entries in x are nonnegative, too. 


The Economic Importance of Entries in (I - C )~ 1 

The entries in (/ — C)~ l are significant because they can be used to predict how the 
production x will have to change when the final demand d changes. In fact, the entries 
in column j of (/ — C) _1 are the increased amounts the various sectors will have to 
produce in order to satisfy an increase of 1 unit in the final demand for output from 
sector j . See Exercise 8. 

i— NUMERICAL NOTE - 

In any applied problem (not just in economics), an equation v4x = b can always be 
written as (/ — C)x = b, with C = I — A . If the system is large and sparse (with 
mostly zero entries), it can happen that the column sums of the absolute values in 
C are less than 1. In this case, C m -> 0. If C m approaches zero quickly enough, 

(6) and (8) will provide practical formulas for solving v4x = b and finding v4 -1 . 
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PRACTICE PROBLEM 

Suppose an economy has two sectors: goods and services. One unit of output from goods 
requires inputs of .2 unit from goods and .5 unit from services. One unit of output from 
services requires inputs of .4 unit from goods and .3 unit from services. There is a final 
demand of 20 units of goods and 30 units of services. Set up the Leontief input-output 
model for this situation. 


2.6 EXERCISES 


_Y 

Agriculture 




I I 

Manufacturing 




Exercises 1-4 refer to an economy that is divided into three 
sectors—manufacturing, agriculture, and services. For each unit 
of output, manufacturing requires .10 unit from other companies 
in that sector, .30 unit from agriculture, and .30 unit from services. 
For each unit of output, agriculture uses .20 unit of its own output, 
.60 unit from manufacturing, and .10 unit from services. For each 
unit of output, the services sector consumes .10 unit from services, 
.60 unit from manufacturing, but no agricultural products. 

1. Construct the consumption matrix for this economy, and de¬ 
termine what intermediate demands are created if agriculture 
plans to produce 100 units. 

2. Determine the production levels needed to satisfy a final 
demand of 18 units for agriculture, with no final demand for 
the other sectors. (Do not compute an inverse matrix.) 

3. Determine the production levels needed to satisfy a final 
demand of 18 units for manufacturing, with no final demand 
for the other sectors. (Do not compute an inverse matrix.) 


4. Determine the production levels needed to satisfy a final de¬ 
mand of 18 units for manufacturing, 18 units for agriculture, 
and 0 units for services. 


5. Consider the production model x = Cx + d for an economy 
with two sectors, where 




50 

30 


Use an inverse matrix to determine the production level 
necessary to satisfy the final demand. 


6. Repeat Exercise 5 with C = 


7. Fet C and d be as in Exercise 5. 

a. Determine the production level necessary to satisfy a final 
demand for 1 unit of output from sector 1. 
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Use an inverse matrix to determine the production level 
necessary to satisfy a final demand of 



Use the fact that 


51 

30 


50 

30 


+ 


1 

0 


to explain how 


and why the answers to parts (a) and (b) and to Exercise 
5 are related. 


8. Let C be an n x n consumption matrix whose column sums 
are less than 1. Let x be the production vector that satisfies 
a final demand d, and let Ax be a production vector that 
satisfies a different final demand Ad. 

a. Show that if the final demand changes from d to d + Ad, 
then the new production level must be x + Ax. Thus Ax 
gives the amounts by which production must change in 
order to accommodate the change Ad in demand. 

b. Let Ad be the vector in IR n with 1 as the first entry and 
0’s elsewhere. Explain why the corresponding production 
Ax is the first column of (/ — C) _1 . This shows that the 
first column of (/ — C) -1 gives the amounts the various 
sectors must produce to satisfy an increase of 1 unit in the 
final demand for output from sector 1. 

9. Solve the Leontief production equation for an economy with 
three sectors, given that 



" .2 

.2 

.0" 


"40" 

c = 

.3 

.1 

.3 

and d = 

60 


.1 

.0 

.2 


80 


10. The consumption matrix C for the U.S. economy in 1972 
has the property that every entry in the matrix (/ — C) -1 is 
nonzero (and positive). 1 What does that say about the effect 
of raising the demand for the output of just one sector of the 
economy? 

11. The Leontief production equation, x = Cx + d, is usually 
accompanied by a dual price equation, 

p = C T p +V 


where p is a price vector whose entries list the price per unit 
for each sector’s output, and v is a value added vector whose 
entries list the value added per unit of output. (Value added 
includes wages, profit, depreciation, etc.) An important fact 
in economics is that the gross domestic product (GDP) can 
be expressed in two ways: 

(gross domestic product} = p r d = v r x 

Verify the second equality. [Hint: Compute p r x in two 
ways.] 


1 Wassily W. Leontief, “The World Economy of the Year 2000,” 
Scientific American , September 1980, pp. 206-231. 


12. Let C be a consumption matrix such that C m ^ 0 as 
m -> oo, and for m = 1,2,..., let D m = I + C + • • • + 
C m . Lind a difference equation that relates D m and D m +i and 
thereby obtain an iterative procedure for computing formula 
(8) for (/ -C)~\ 


13. [M] The consumption matrix C below is based on input- 
output data for the U.S. economy in 1958, with data for 81 
sectors grouped into 7 larger sectors: (1) nonmetal household 
and personal products, (2) final metal products (such as motor 
vehicles), (3) basic metal products and mining, (4) basic 
nonmetal products and agriculture, (5) energy, (6) services, 
and (7) entertainment and miscellaneous products. 2 Lind the 
production levels needed to satisfy the final demand d. (Units 
are in millions of dollars.) 


.1588 

.0064 

.0025 

.0304 

.0014 

.0083 

.1594 

.0057 

.2645 

.0436 

.0099 

.0083 

.0201 

.3413 

.0264 

1506 

.3557 

.0139 

.0142 

.0070 

.0236 

.3299 

.0565 

.0495 

.3636 

.0204 

.0483 

.0649 

.0089 

.0081 

.0333 

.0295 

.3412 

.0237 

.0020 

.1190 

.0901 

.0996 

.1260 

.1722 

.2368 

.3369 

.0063 

.0126 

.0196 

.0098 

.0064 

.0132 

.0012 



74,000 

56,000 

10.500 
25,000 

17.500 
196,000 

5,000 


14. [M] The demand vector in Exercise 13 is reasonable for 
1958 data, but Leontief’s discussion of the economy in the 
reference cited there used a demand vector closer to 1964 
data: 


d = (99640, 75548, 14444, 33501, 23527, 263985, 6526) 
Lind the production levels needed to satisfy this demand. 


15. [M] Use equation (6) to solve the problem in Exer¬ 
cise 13. Set x (0) = d, and for & = 1,2, ..., compute 
x (k) = d + C x (/r-1) . How many steps are needed to obtain 
the answer in Exercise 13 to four significant figures? 


2 Wassily W. Leontief, “The Structure of the U.S. Economy,” 
Scientific American , April 1965, pp. 30-32. 
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SOLUTION TO PRACTICE PROBLEM 


The following data are given: 



Inputs Needed per Unit of Output 


Purchased from: 

Goods 

Services 

External Demand 

Goods 

.2 

.4 

20 

Services 

.5 

.3 

30 


The Leontief input-output model is x = C x + d , where 




20 

30 


2.7 APPLICATIONS TO COMPUTER GRAPHICS 



FIGURE 1 

Regular N. 


Computer graphics are images displayed or animated on a computer screen. Applica¬ 
tions of computer graphics are widespread and growing rapidly. For instance, computer- 
aided design (CAD) is an integral part of many engineering processes, such as the 
aircraft design process described in the chapter introduction. The entertainment industry 
has made the most spectacular use of computer graphics —from the special effects in 
Amazing Spider-Man 2 to PlayStation 4 and Xbox One. 

Most interactive computer software for business and industry makes use of com¬ 
puter graphics in the screen displays and for other functions, such as graphical display 
of data, desktop publishing, and slide production for commercial and educational pre¬ 
sentations . Consequently, anyone studying a computer language invariably spends time 
learning how to use at least two-dimensional (2D) graphics. 

This section examines some of the basic mathematics used to manipulate and dis¬ 
play graphical images such as a wire-frame model of an airplane. Such an image (or 
picture) consists of a number of points, connecting lines or curves, and information 
about how to fill in closed regions bounded by the lines and curves. Often, curved lines 
are approximated by short straight-line segments, and a figure is defined mathematically 
by a list of points. 

Among the simplest 2D graphics symbols are letters used for labels on the screen. 
Some letters are stored as wire-frame objects; others that have curved portions are stored 
with additional mathematical formulas for the curves. 


EXAMPLE 1 The capital letter N in Figure 1 is determined by eight points, or 
vertices. The coordinates of the points can be stored in a data matrix, D . 



1 

2 

3 

4 

Vertex: 

5 

6 

x-coordinate 

"0 

.5 

.5 

6 

6 

5.5 

y -coordinate 

0 

0 

6.42 

0 

8 

8 



In addition to D , it is necessary to specify which vertices are connected by lines, but we 
omit this detail. ■ 


The main reason graphical objects are described by collections of straight-line seg¬ 
ments is that the standard transformations in computer graphics map line segments onto 
other line segments. (For instance, see Exercise 27 in Section 1.8.) Once the vertices 
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8 6 5 



1 2 4 


FIGURE 2 

Slanted N. 


that describe an object have been transformed, their images can be connected with the 
appropriate straight lines to produce the complete image of the original object. 


EXAMPLE 2 Given A = 



describe the effect of the shear transforma¬ 


tion x i-> Ax on the letter N in Example 1. 


SOLUTION By definition of matrix multiplication, the columns of the product AD 
contain the images of the vertices of the letter N. 

1 2 3 4 5 6 7 8 

, ^ To .5 2.105 6 8 7.5 5.895 2" 

A I) = 

|_° 0 6.420 0 8 8 1.580 8 

The transformed vertices are plotted in Figure 2, along with connecting line segments 
that correspond to those in the original figure. ■ 

The italic N in Figure 2 looks a bit too wide. To compensate, shrink the width by a 
scale transformation that affects the x-coordinates of the points. 



FIGURE 3 

Composite transformation of N. 


EXAMPLE 3 Compute the matrix of the transformation that performs a shear trans¬ 
formation, as in Example 2, and then scales all x-coordinates by a factor of .75. 


SOLUTION The matrix that multiplies the x-coordinate of a point by .75 is 



So the matrix of the composite transformation is 



The result of this composite transformation is shown in Figure 3. 


The mathematics of computer graphics is intimately connected with matrix multi¬ 
plication. Unfortunately, translating an object on a screen does not correspond directly 
to matrix multiplication because translation is not a linear transformation. The standard 
way to avoid this difficulty is to introduce what are called homogeneous coordinates. 





Homogeneous Coordinates 

Each point (x, y) in M 2 can be identified with the point (x, y, 1) on the plane in M 3 
that lies one unit above the xy-plane. We say that (x, y) has homogeneous coordinates 
(x, y, 1). For instance, the point (0, 0) has homogeneous coordinates (0, 0, 1). Homo¬ 
geneous coordinates for points are not added or multiplied by scalars, but they can be 
transformed via multiplication by 3 x 3 matrices. 

EXAMPLE 4 A translation of the form (x, y) i->- (x + h, y + k) is written in ho¬ 
mogeneous coordinates as (x,y, 1) i-^(x + h,y + k, 1). This transformation can be 
computed via matrix multiplication: 


"1 

0 

h 


X 


x + h 


0 

1 

k 


y 

— 

y + k 

■ 

0 

0 

1 


l 


1 
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EXAMPLE 5 Any linear transformation on M 2 is represented with respect to homo¬ 


geneous coordinates by a partitioned matrix of the form 
matrix. Typical examples are 


A 

0 


0 

1 


, where A is a 2 x 2 


COS (p 

— sin^ 

0 


0 

1 

0 


s 

0 

0 

sin<^ 

COS (p 

0 

5 

1 

0 

0 

9 

0 

t 

0 

0 

0 

1 


0 

0 

1 


0 

0 

1 


Counterclockwise 
rotation about the 
origin, angle cp 


Reflection 
through y = x 


Scale x by s 
and y by t 


Original Figure 


After Scaling 


After Rotating 


After Translating 


Composite Transformations 

The movement of a figure on a computer screen often requires two or more basic trans¬ 
formations. The composition of such transformations corresponds to matrix multiplica¬ 
tion when homogeneous coordinates are used. 


EXAMPLE 6 Find the 3 x 3 matrix that corresponds to the composite transforma¬ 
tion of a scaling by .3, a rotation of 90° about the origin, and finally a translation that 
adds (—.5,2) to each point of a figure. 


SOLUTION If (p = 7 t/2 , then sin^ = 1 and cos (p = 0. From Examples 4 and 5, we 
have 



y 

1 


Scale 

_ V 

7 


.3 

0 

0 


0 

.3 

0 


0 

0 

1 



y 

l 


Rotate 


Translate 
-—> 


0-1 0 


.3 0 0 


X 

1 0 0 


0 .3 0 


y 

0 0 1 


0 0 1 


1 


1 

0 - 

-.5 


0 

-1 

0 


.3 

0 

0 


X 

0 

1 

2 


1 

0 

0 


0 

.3 

0 


y 

0 

0 

1 


0 

0 

1 


0 

0 

1 


i 


The matrix for the composite transformation is 



3D Computer Graphics 

Some of the newest and most exciting work in computer graphics is connected with 
molecular modeling. With 3D (three-dimensional) graphics, a biologist can examine a 
simulated protein molecule and search for active sites that might accept a drug molecule. 
The biologist can rotate and translate an experimental drug and attempt to attach it to the 
protein. This ability to visualize potential chemical reactions is vital to modern drug and 
cancer research. In fact, advances in drug design depend to some extent upon progress 
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in the ability of computer graphics to construct realistic simulations of molecules and 
their interactions. 1 

Current research in molecular modeling is focused on virtual reality , an environ¬ 
ment in which a researcher can see and feel the drug molecule slide into the protein. In 
Figure 4, such tactile feedback is provided by a force-displaying remote manipulator. 



FIGURE 4 Molecular modeling in virtual reality. 


Another design for virtual reality involves a helmet and glove that detect head, hand, and 
finger movements. The helmet contains two tiny computer screens, one for each eye. 
Making this virtual environment more realistic is a challenge to engineers, scientists, 
and mathematicians. The mathematics we examine here barely opens the door to this 
interesting field of research. 


Homogeneous 3D Coordinates 


By analogy with the 2D case, we say that (x, y, z, 1) are homogeneous coordinates for 
the point (x, y, z) in M 3 . In general, (X, Y, Z, H) are homogeneous coordinates for 
(x, y, z) if H ^ 0 and 







Each nonzero scalar multiple of (x,y,z, 1) gives a set of homogeneous coordinates 
for (x, y, z). For instance, both (10, —6,14, 2) and (—15, 9, —21, —3) are homogeneous 
coordinates for (5, —3, 7). 

The next example illustrates the transformations used in molecular modeling to 
move a drug into a protein molecule. 

EXAMPLE 7 Give 4 x 4 matrices for the following transformations: 

a. Rotation about the y-axis through an angle of 30°. (By convention, a positive angle 
is the counterclockwise direction when looking toward the origin from the positive 
half of the axis of rotation—in this case, the y-axis.) 

b. Translation by the vector p = (— 6 , 4, 5). 

SOLUTION 

a. First, construct the 3x3 matrix for the rotation. The vector ei rotates down toward 
the negative z-axis, stopping at (cos 30°, 0, — sin 30°) = (\/3/2, 0, —.5). The vector 


1 Robert Pool, “Computing in Science,” Science 256, 3 April 1992, p. 45. 
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FIGURE 5 


e 2 on the y -axis does not move, but e 3 on the z-axis rotates down toward the positive 
x-axis, stopping at (sin 30°, 0, cos 30°) = (.5, 0, \/3/2). See Figure 5. From Section 
1.9, the standard matrix for this rotation is 

"V3/2 0 .5 

0 1 0 

-.5 0 V3/2 


So the rotation matrix for homogeneous coordinates is 



0 .5 0 

1 0 0 

0 V3/2 0 

0 0 1 


b. We want (x, y, z, 1) to map to (x — 6, y + 4, z + 5, 1). The matrix that does this is 


10 0-6 
0 10 4 

0 0 15 

0 0 0 1 


Perspective Projections 

A three-dimensional object is represented on the two-dimensional computer screen by 
projecting the object onto a viewing plane. (We ignore other important steps, such as 
selecting the portion of the viewing plane to display on the screen.) For simplicity, let 
the xy -plane represent the computer screen, and imagine that the eye of a viewer is 
along the positive z-axis, at a point (0, 0, d). A perspective projection maps each point 
(x, y, z) onto an image point (x*, y*, 0) so that the two points and the eye position, 
called the center of projection, are on a line. See Figure 6(a). 




z 



(a) (b) 

FIGURE 6 Perspective projection of (x, y, z) onto (x*, y*, 0). 





























2.7 Applications to Computer Graphics 145 


The triangle in the xz-plane in Figure 6(a) is redrawn in part (b) showing the lengths 
of line segments. Similar triangles show that 


>k 

X X 

d d — z 




x 

1 - z/d 


Similarly, 



y 

1 z/d 


Using homogeneous coordinates, we can represent the perspective projection by a ma- 

( x y \ 

trix, say, P . We want (x, y , z, 1) to map into -—,-—, 0,1 . Scaling these 

\l-z/d 1 -z/d ) 

coordinates by 1 —z/d, we can also use (x, y, 0,1 — z/d) as homogeneous coordinates 
for the image. Now it is easy to display P . In fact, 



X 


10 0 0 


X 


X 

y 


0 10 0 


y 


y 

z 


0 0 0 0 


z 


0 

1 


0 0 -\/d 1 


1 


_1 - z/d _ 


EXAMPLE 8 Let S be the box with vertices (3,1,5), (5,1,5), (5,0, 5), (3,0, 5), 
(3,1,4), (5,1,4), (5,0, 4), and (3,0,4). Find the image of S under the perspective pro¬ 
jection with center of projection at (0,0, 10). 


SOLUTION Let P be the projection matrix, and let D be the data matrix for S using 
homogeneous coordinates. The data matrix for the image of S is 



Vertex: 







1 

2 

3 

4 

5 

6 

7 

OO 

—i 

0 

0 

0 


"3 

5 

5 

3 

3 

5 

5 

CO 

_1 

0 

1 

0 

0 


1 

1 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 


5 

5 

5 

5 

4 

4 

4 

4 

0 

0 ■ 

-1/10 

1 


1 

1 

1 

1 

1 

1 

1 

1 

—1 

CO 

5 

5 3 

3 

5 

5 

3" 






1 

1 

0 0 

1 

1 

0 

0 






0 

0 

0 0 

0 

0 

0 

0 






.5 

.5 

.5 .5 

.6 

.6 

.6 

.6 







To obtain M 3 coordinates, use equation (1) before Example 7, and divide the top three 
entries in each column by the corresponding entry in the fourth row: 


S under the perspective 
transformation. 


1 

6 

2 

0 


2 

10 

2 

0 


Vertex: 

3 4 5 6 

10 6 5 8.3 

0 0 1.7 1.7 

0 0 0 0 


7 

8.3 

0 

0 


8 

5 

0 

0 



WEB 


This text’s web site has some interesting applications of computer graphics, includ¬ 
ing a further discussion of perspective projections. One of the computer projects on the 
web site involves simple animation. 
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NUMERICAL NOTE - 

Continuous movement of graphical 3D objects requires intensive computation 
with 4x4 matrices, particularly when the surfaces are rendered to appear 
realistic, with texture and appropriate lighting. High-end computer graphics 
boards have 4x4 matrix operations and graphics algorithms embedded in their 
microchips and circuitry. Such boards can perform the billions of matrix multipli¬ 
cations per second needed for realistic color animation in 3D gaming programs. 2 


Further Reading 

James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes, Computer 
Graphics: Principles and Practice , 3rd ed. (Boston, MA: Addison-Wesley, 2002), 
Chapters 5 and 6. 


PRACTICE PROBLEM 


Rotation of a figure about a point p in M 2 is accomplished by first translating the figure 
by — p, rotating about the origin, and then translating back by p. See Figure 7. Construct 
the 3x3 matrix that rotates points —30° about the point (—2, 6), using homogeneous 
coordinates. 



(a) Original figure. (b) Translated to 

origin by -p. 

FIGURE 7 Rotation of figure about point p. 



(c) Rotated about 
the origin. 


(d) Translated 
back by p. 


2.7 EXERCISES 


1. What 3 x 3 matrix will have the same effect on homogeneous 
coordinates for R 2 that the shear matrix A has in Example 2? 


2 . 


with data matrix D — 


under the transforma- 


Use matrix multiplication to find the image of the triangle 

'5 2 4 

0 2 3 

tion that reflects points through the y-axis. Sketch both the 
original triangle and its image. 


In Exercises 3-8, find the 3x3 matrices that produce the de¬ 
scribed composite 2D transformations, using homogeneous coor¬ 
dinates . 


3. Translate by (3,1), and then rotate 45° about the origin. 

4. Translate by (—2,3), and then scale the x-coordinate by 
.8 and the y-coordinated by 1.2. 


5. Reflect points through the x-axis, and then rotate 30° about 
the origin. 

6 . Rotate points 30°, and then reflect through the x-axis. 

7. Rotate points through 60° about the point (6, 8). 

8. Rotate points through 45° about the point (3,7). 

9. A 2 x 200 data matrix D contains the coordinates of 200 
points. Compute the number of multiplications required 
to transform these points using two arbitrary 2x2 ma¬ 
trices A and B. Consider the two possibilities A{BD ) and 
( AB ) D. Discuss the implications of your results for computer 
graphics calculations. 

10. Consider the following geometric 2D transformations: D , a 
dilation (in which x-coordinates and y-coordinates are scaled 


2 See Jan Ozer, “High-Performance Graphics Boards,” PC Magazine 19, 1 September 2000, pp. 187-200. 
Also, “The Ultimate Upgrade Guide: Moving On Up,” PC Magazine 21, 29 January 2002, pp. 82-91. 























2.7 Applications to Computer Graphics 147 


by the same factor); R , a rotation; and 7, a translation. Does 
D commute with R1 That is, is D (R(x)) = R (Z)(x)) for all 
x in M 2 ? Does D commute with T1 Does R commute with T1 


11. A rotation on a computer screen is sometimes implemented 
as the product of two shear-and-sc ale transformations, which 
can speed up calculations that determine how a graphic image 
actually appears in terms of screen pixels. (The screen con¬ 
sists of rows and columns of small dots, called pixels.) The 
first transformation A\ shears vertically and then compresses 
each column of pixels; the second transformation A 2 shears 
horizontally and then stretches each row of pixels. Let 


^2 


1 

0 0 


sin<^ 

cos (p 0 

5 

0 

0 !_ 


sec (p 

— tan <p 0 

0 

1 0 

0 

0 1 


Show that the composition of the two transformations is 
rotation in R 2 . 




12. A rotation in M 2 usually requires four multiplications. Com¬ 
pute the product below, and show that the matrix for a rota¬ 
tion can be factored into three shear transformations (each of 
which requires only one multiplication). 


" 1 

— tan (p / 2 

0" 


1 

0 

0" 

0 

1 

0 


sin (p 

1 

0 

0 

0 

1 


0 

0 

1 


1 — tan ip/2 0 

0 1 0 

0 0 1 


13. The usual transformations on homogeneous coordinates for 
2D computer graphics involve 3x3 matrices of the form 

where A is a 2 x 2 matrix and p is in R 2 . Show 

that such a transformation amounts to a linear transformation 
on R 2 followed by a translation. [Hint: Find an appropriate 
matrix factorization involving partitioned matrices.] 



14. Show that the transformation in Exercise 7 is equivalent to 
a rotation about the origin followed by a translation by p. 
Find p. 


15. What vector in R 3 has homogeneous coordinates 
(I -I I _LW 

V 2 ’ 4 ’ 8 ’ 24 / ' 

16. Are (1, —2, 3,4) and (10, —20, 30,40) homogeneous coordi¬ 
nates for the same point in R 3 ? Why or why not? 


17. Give the 4x4 matrix that rotates points in R 3 about the 
v-axis through an angle of 60°. (See the figure.) 


y 


18. Give the 4 x 4 matrix that rotates points in R 3 about the 
z-axis through an angle of —30°, and then translates by 
P = (5, —2,1). 

19. Let S be the triangle with vertices (4.2,1.2, 4), (6, 4,2), 
(2,2,6). Find the image of S under the perspective projection 
with center of projection at (0,0,10). 

20. Let S be the triangle with vertices (9, 3,—5), (12, 8,2), 
(1.8,2.7,1). Find the image of S under the perspective pro¬ 
jection with center of projection at (0,0,10). 

Exercises 21 and 22 concern the way in which color is specified 
for display in computer graphics. A color on a computer screen 
is encoded by three numbers (. R , G, B ) that list the amount of 
energy an electron gun must transmit to red, green, and blue 
phosphor dots on the computer screen. (A fourth number specifies 
the luminance or intensity of the color.) 



21. [M] The actual color a viewer sees on a screen is influenced 
by the specific type and amount of phosphors on the screen. 
So each computer screen manufacturer must convert between 
the (. R , G, B) data and an international CIE standard for color, 
which uses three primary colors, called A, F, and Z. A typical 
conversion for short-persistence phosphors is 


".61 .29 .150" 


" R~ 


"A" 

.35 .59 .063 


G 

— 

F 

.04 .12 .787 


B 


Z 


A computer program will send a stream of color information 
to the screen, using standard CIE data (A, Y, Z). Find the 
equation that converts these data to the (. R,G,B ) data needed 
for the screen’s electron gun. 

22. [M] The signal broadcast by commercial television describes 
each color by a vector (F, /, Q). If the screen is black and 
white, only the F-coordinate is used. (This gives a better 
monochrome picture than using CIE data for colors.) The 
correspondence between YIQ and a “standard” RGB color is 
given by 


F 


".299 .587 .114" 


" R~ 

I 

— 

.596 -.275 -.321 


G 

Q_ 


.212 -.528 .311 


B 


(A screen manufacturer would change the matrix entries to 
work for its RGB screens.) Find the equation that converts 
the YIQ data transmitted by the television station to the RGB 
data needed for the television screen. 
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CHAPTER 2 


Matrix Algebra 


SOLUTION TO PRACTICE PROBLEM 


Assemble the matrices right-to-left for the three operations. Using p = (—2,6), 
cos(—30°) = \/3/2, and sin(—30°) = —.5, we have 


Translate Rotate around 

back by p the origin 


1 

0 

0 



1/2 

V3/2 

0 


Translate 


by -p 


1 

0 


1 0 2 

0 


0 1 -6 

1_ 


—1 

0 

0 

_1 


V3/2 1/2 V3-5 
-1/2 A/2 -3^3 + 5 

0 0 1 


2.8 SUBSPACES OF M" 

This section focuses on important sets of vectors in M 77 called subspaces. Often sub¬ 
spaces arise in connection with some matrix A, and they provide useful information 
about the equation Ax = b. The concepts and terminology in this section will be used 
repeatedly throughout the rest of the book. 1 


DEFINITION 



FIGURE 1 

Span {vi, v 2 } as a plane through 
the origin. 


A subspace of M 77 is any set H in M 77 that has three properties: 


a. The zero vector is in H . 

b. For each u and v in H, the sum u + v is in H . 

c. For each u in H and each scalar c , the vector c u is in H . 


In words, a subspace is closed under addition and scalar multiplication. As you will 
see in the next few examples, most sets of vectors discussed in Chapter 1 are subspaces. 
For instance, a plane through the origin is the standard way to visualize the subspace in 
Example 1. See Figure 1. 

EXAMPLE 1 If Vi and V 2 are in M 77 and H = Span {vi , V 2 }, then H is a subspace 
of M 77 . To verify this statement, note that the zero vector is in H (because Ovi + OV 2 is 
a linear combination of Vi and V 2 ) . Now take two arbitrary vectors in H , say, 

u = s\\\ + s 2 \ 2 and v = t\V\ + t 2 \ 2 

Then 

U + V = (si + ti)\i + (s 2 + t 2 )\ 2 


which shows that u + v is a linear combination of Vi and V 2 and hence is in H . Also, for 
any scalar c, the vector cu is in H, because c u = c(srvi + ^ 2 ) = (cs i)vi + (cs 2 )v 2 . 


If Vi is not zero and if V 2 is a multiple of Vi, then Vi and V 2 simply span a line 
through the origin. So a line through the origin is another example of a subspace. 


1 Sections 2.8 and 2.9 are included here to permit readers to postpone the study of most or all of the next two 
chapters and to skip directly to Chapter 5, if so desired. Omit these two sections if you plan to work through 
Chapter 4 before beginning Chapter 5. 
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EXAMPLE 2 A line L not through the origin is not a subspace, because it does not 
contain the origin, as required. Also, Figure 2 shows that L is not closed under addition 
or scalar multiplication. ■ 




EXAMPLE 3 For Vi,..., v p in R 77 , the set of all linear combinations of Vi,..., 
is a subspace of R 77 . The verification of this statement is similar to the argument given 
in Example 1. We shall now refer to Span{vi,..., v^} as the subspace spanned (or 
generated) by Vi,..., \ p . ■ 

Note that R 77 is a subspace of itself because it has the three properties required for 
a subspace. Another special subspace is the set consisting of only the zero vector in R 77 . 
This set, called the zero subspace, also satisfies the conditions for a subspace. 

Column Space and Null Space of a Matrix 

Subspaces of R 77 usually occur in applications and theory in one of two ways. In both 
cases, the subspace can be related to a matrix. 


DEFINITION 


The column space of a matrix A is the set Col A of all linear combinations of the 
columns of A . 


If A = [ ai • • • a„ ], with the columns in R 777 , then Col A is the same as 
Spanjai,..., a /7 }. Example 4 shows that the column space of an m x n matrix is a 
subspace of R 777 . Note that Col A equals R 777 only when the columns of A span R 777 . 
Otherwise, Col A is only part of R 777 . 





EXAMPLE 4 Let A = 

in the column space of A . 


1 

-3 

-4" 


3 


-4 

6 

—2 

and b = 

3 

. Determine whether b is 

-3 

7 

6 


-4 



SOLUTION The vector b is a linear combination of the columns of A if and only if 
b can be written as Ax for some x, that is, if and only if the equation Ax = b has a 
solution. Row reducing the augmented matrix [A b ], 


1 

-3 

-4 

_1 


"1 

-3 

-4 

3 


"1 

-3 

-4 

1 

CO 

-4 

6 

-2 

3 


0 

-6 

-18 

15 


0 

-6 

-18 

15 

i— 

u> 

7 

6 

-4 


0 

-2 

-6 

5 


0 

0 

0 

0 


we conclude that Ax = b is consistent and b is in Col A . 
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The solution of Example 4 shows that when a system of linear equations is written 
in the form Ax = b, the column space of A is the set of all b for which the system has 
a solution. 

DEFINITION The null space of a matrix A is the set Nul A of all solutions of the homogeneous 

equation Ax = 0. 


When A has n columns, the solutions of Ax = 0 belong to R 77 , and the null space 
of A is a subset of R 77 . In fact, Nul A has the properties of a sub space of R 77 . 


THEOREM 12 The null space of an m x n matrix A is a subspace of R 77 . Equivalently, the set of all 

solutions of a system Ax = 0 of m homogeneous linear equations in n unknowns 
is a subspace of R 77 . 


PROOF The zero vector is in Nul A (because AO = 0). To show that Nul A satisfies the 
other two properties required for a subspace, take any u and v in Nul A . That is, suppose 
Au = 0 and Av = 0. Then, by a property of matrix multiplication, 

A(u + v) = Au + Av = 0 + 0 = 0 

Thus u + v satisfies Ax = 0, and so u + v is in Nul A. Also, for any scalar c, A(cu) = 
c(Au) = c(0) = 0, which shows that c u is in Nul A . ■ 

To test whether a given vector v is in Nul A, just compute Av to see whether Av is 
the zero vector. Because Nul A is described by a condition that must be checked for each 
vector, we say that the null space is defined implicitly. In contrast, the column space is 
defined explicitly , because vectors in Col A can be constructed (by linear combinations) 
from the columns of A. To create an explicit description of Nul A, solve the equation 
Ax = 0 and write the solution in parametric vector form. (See Example 6, below.) 2 

Basis for a Subspace 

Because a subspace typically contains an infinite number of vectors, some problems 
involving a subspace are handled best by working with a small finite set of vectors that 
span the subspace. The smaller the set, the better. It can be shown that the smallest 
possible spanning set must be linearly independent. 


DEFINITION 

*3 


l e 3 

A 



X ei 

FIGURE 3 

The standard basis for R 3 . 


A basis for a subspace H of R 77 is a linearly independent set in H that spans H . 


EXAM PLE 5 The columns of an invertible n x n matrix form a basis for all of R 77 
because they are linearly independent and span R 77 , by the Invertible Matrix Theorem. 
One such matrix is the nxn identity matrix. Its columns are denoted by ei,..., e n : 



i 

i _ 


i 

o 

i_ 


1 - 

o 

1 _ 

ei = 

0 

• 

• 

• 

, e 2 = 

1 

• 

• 

• 

, • • • J 

• 

• 

• 

0 


i 

o 

_ i 


i 

o 
_1 


— i 

_ i 


The set {ei ,..., e„} is called the standard basis for R 77 . See Figure 3. ■ 


2 The contrast between Nul A and Col A is discussed further in Section 4.2. 
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The next example shows that the standard procedure for writing the solution set of 
Ax = 0 in parametric vector form actually identifies a basis for Nul A . This fact will be 
used throughout Chapter 5. 


EXAMPLE 6 Find a basis for the null space of the matrix 



1 

3 

8 



SOLUTION First, write the solution of Ax = 0 in parametric vector form: 


[A 0] - 


1 

0 

0 



0-130 
12-20 
0 0 0 0 


X\ — 2 X 2 — X4 + 3 X 5 = 0 

X3 + 2x4 — 2x5 = 0 

0 = 0 


The general solution is x\ = 2x2 + X4 — 3xs, *3 = — 2 x 4 + 2 x 5 , with X2, X4, and X5 
free. 


X\ 


2 x 2 + X4 — 3 x 5 


"2" 


1 " 


-3 

x 2 


x 2 


1 


0 


0 

X3 

— 

— 2 X 4 + 2X5 

= x 2 

0 

+ X4 

-2 

+ X5 

2 

X4 


X4 


0 


1 


0 

X 5 _ 


*5 


0 


0 


1 


Ill 

U V w 

= X2U + X4V + X5W ( 1 ) 

Equation (1) shows that Nul A coincides with the set of all linear combinations of u, 
v, and w. That is, {u, v,w} generates Nul A. In fact, this construction of u, v, and w 
automatically makes them linearly independent, because equation (1) shows that 0 = 
X2U + X4V + X5W only if the weights X2, X4, and X5 are all zero. (Examine entries 2 , 4 , 
and 5 in the vector X 211 + X4V + X5W.) So {u, v, w} is a basis for Nul A. ■ 


Finding a basis for the column space of a matrix is actually less work than finding 
a basis for the null space. However, the method requires some explanation. Let’s begin 
with a simple case. 


EXAMPLE 7 Find a basis for the column space of the matrix 

"1 0-3 5 0 " 

0 1 2-1 0 

^”00001 

0 0 0 0 0 


SOLUTION Denote the columns of ^ by bi,... ,b 5 and note that b 3 = — 3b 1 + 2b2 and 
b 4 = 5b 1 — b 2 . The fact that b 3 and b 4 are combinations of the pivot columns means that 
any combination of bi,..., b 5 is actually just a combination of bi, b 2 , and b 5 . Indeed, 
if v is any vector in Col B , say, 


V — C\b\ + + C4b4 + C5b5 


then, substituting for b 3 and b 4 , we can write v in the form 

v = cibi + c 2 b 2 + c 3 (— 3 bi + 2 b 2 ) + c 4 ( 5 bi - b 2 ) + c 5 b 5 

which is a linear combination of bi, b 2 , and b 5 . So {bi, b 2 , b 5 } spans Col B . Also, bi, 
b 2 , and b 5 are linearly independent, because they are columns from an identity matrix. 
So the pivot columns of B form a basis for Col B. ■ 
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The matrix B in Example 7 is in reduced echelon form. To handle a general matrix 
A, recall that linear dependence relations among the columns of A can be expressed 
in the form Ax = 0 for some x. (If some columns are not involved in a particular 
dependence relation, then the corresponding entries in x are zero.) When A is row 
reduced to echelon form B , the columns are drastically changed, but the equations 
Ax = 0 and Bx = 0 have the same set of solutions. That is, the columns of A have 
exactly the same linear dependence relationships as the columns of B . 

EXAMPLE 8 It can be verified that the matrix 

1 3 3 2 -9" 

-2 -2 2-8 2 

2 3 0 7 1 

3 4 -1 11 —8 _ 

is row equivalent to the matrix B in Example 7. Find a basis for Col A . 

SOLUTION From Example 7, the pivot columns of A are columns 1, 2, and 5. 
Also, b 3 = —3bi + 2b2 and b 4 = 5bi — b 2 . Since row operations do not affect linear 
dependence relations among the columns of the matrix, we should have 

a 3 = —3ai + 2 a 2 and a 4 = 5ai — a 2 

Check that this is true! By the argument in Example 7, a 3 and a 4 are not needed to 
generate the column space of A . Also, {ai, a 2 , as} must be linearly independent, because 
any dependence relation among ai, a 2 , and a 5 would imply the same dependence relation 
among bi, b 2 , and bs. Since {bi,b 2 ,bs} is linearly independent, {ai,a 2 ,as} is also 
linearly independent and hence is a basis for Col A . ■ 

The argument in Example 8 can be adapted to prove the following theorem. 


A = [ ai a2 • • • as ] = 


THEOREM 13 


The pivot columns of a matrix A form a basis for the column space of A . 


Warning: Be careful to use pivot columns of A itself for the basis of Col A. The 
columns of an echelon form B are often not in the column space of A. (For instance, 
in Examples 7 and 8, the columns of B all have zeros in their last entries and cannot 
generate the columns of A .) 



Mastering: Subspace, 

Col A Nul A, Basis 2-37 


PRACTICE PROBLEMS 




1 - 

1 

5" 


“-7 


1. Let A = 


2 

0 

7 

and u = 

3 

. Is u in Nul A? Is u in Col A? Justify 



-3 - 

■5 

-3 


2 


each answer. 








0 

1 

0 " 




2. Given A = 


0 

0 

1 

, find a vector in Nul A and a vector in Col A. 



0 

0 

0 




3. Suppose an 

n x n 

matrix A is invertible. What can you say about Col A? About 


Nul A? 
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2.8 EXERCISES 


Exercises 1-4 display sets in R 2 . Assume the sets include the 
bounding lines. In each case, give a specific reason why the set 
H is not a subspace of R 2 . (For instance, find two vectors in H 
whose sum is not in H, or find a vector in H with a scalar multiple 
that is not in H. Draw a picture.) 

1 . 




2" 


~-3“ 


~-4“ 

7. Let Vi = 

-8 

6 

, v 2 = 

8 

-7 

, v 3 = 

6 

-7 



, and A = [vt v 2 v 3 ]. 


a. How many vectors are in {vx, v 2 , v 3 }? 


b. How many vectors are in Col A? 

c. Is p in Col A? Why or why not? 



~-3“ 


~-2“ 


0 


8. Let Vi = 

0 

6 

, v 2 = 

2 

3 

, v 3 = 

i 

U> 

1_ 

, and p 



. Determine if p is in Col A, where A = [vi v 2 v 3 ]. 


9. With A and p as in Exercise 7, determine if p is in Nul A. 


10. With u = (—2, 3,1) and A as in Exercise 8, determine if u is 
in Nul A. 


In Exercises 11 and 12, give integers p and q such that Nul A is a 
subspace of R^ and Col A is a subspace of R q . 





2 


~-4~ 


8" 

5. Let vi = 

i 

Lh UJ 

1_ 

, V 2 = 

-5 

8 

, and w = 

2 

-9 


. Deter 


mine if w is in the subspace of R 3 generated by yi and v 2 . 


Vi = 

1 " 
-2 

4 

, V 2 = 

4“ 

-7 

9 

, v 3 = 

5" 

-8 

6 


i 

LO 

1_ 


7 


i 

Ul 

1_ 


, and u = 


-4 

10 

-7 


. Determine if u is in the subspace of R 4 generated 


_ —5 _ 

by {vi,v 2 ,v 3 }. 



12. A = 





13. For A as in Exercise 11, find a nonzero vector in Nul A and a 
nonzero vector in Col A. 


14. For A as in Exercise 12, find a nonzero vector in Nul A and a 
nonzero vector in Col A. 


Determine which sets in Exercises 15-20 are bases for R 2 or R 3 
Justify each answer. 



1 


i 

_1 


i 

<N 

1_ 


i 

o 

i_ 

-6 


-4 


7 


OC 

i 

i_ 


7 


5 


i 

_1 


20 . 
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In Exercises 21 and 22, mark each statement True or False. Justify 
each answer. 

21. a. A subspace of R n is any set H such that (i) the zero vector 

is in H , (ii) u, v, and u + v are in H , and (iii) c is a scalar 
and cu is in H. 

b. If Vi,..., y p are in R ' 1 , then Span {vi,..., v^} is the same 
as the column space of the matrix [vi • • • v^]. 

c. The set of all solutions of a system of m homogeneous 
equations in n unknowns is a subspace of R m . 

d. The columns of an invertible n x n matrix form a basis 
for R n . 

e. Row operations do not affect linear dependence relations 
among the columns of a matrix. 

22. a. A subset H of R n is a subspace if the zero vector is in H. 

b. Given vectors ,\ p in R n , the set of all linear com¬ 

binations of these vectors is a subspace of R n . 

c. The null space of an m x n matrix is a subspace of R n . 

d. The column space of a matrix A is the set of solutions of 
Ax = b. 


26. A = 





3 

7 

3 

3 

6 

3 

1 

0 


9 

5 

4 

7 


27. Construct a nonzero 3x3 matrix A and a nonzero vector b 
such that b is in Col A, but b is not the same as any one of the 
columns of A. 


28. Construct a nonzero 3x3 matrix A and a vector b such that 
b is not in Col A. 


29. Construct a nonzero 3x3 matrix A and a nonzero vector b 
such that b is in Nul A. 

30. Suppose the columns of a matrix A = [ai • • • a^] are lin¬ 
early independent. Explain why {ai,..., a p } is a basis for 
Col A. 


In Exercises 31-36, respond as comprehensively as possible, and 
justify your answer. 


e. If B is an echelon form of a matrix A, then the pivot 
columns of B form a basis for Col A. 


Exercises 23-26 display a matrix A and an echelon form of A. Find 
a basis for Col A and a basis for Nul A. 



1_ 

5 

9 

to 

_i 


1 

2 

6 

L/l 

_1 

23. A = 

6 

5 

1 

12 


0 

1 

5 

-6 


_ 3 

4 

00 

1_ 


i 

o 

0 

0 

i 

o 




1 

0 

0 

0 


4 

2 

0 

0 


8 

5 

0 

0 


0 5 

0 -1 
1 4 

0 0 




"-3 

9 

-2 

-7“ 


" 1 

-3 

6 

9" 

24. 

A = 

2 

-6 

4 

8 


0 

0 

4 

5 



3 

-9 

-2 

2 _ 


0 

0 

0 

0 



1 

4 

8 

-3 

-7“ 





25. 

A = 

-1 

2 

7 

3 

4 

5 





-2 

2 

9 

5 







3 

6 

9 

-5 

-2 






31. Suppose F is a 5 x 5 matrix whose column space is not equal 
to IR 5 . What can you say about Nul FI 

32. If R is a 6 x 6 matrix and Nul R is not the zero subspace, what 
can you say about Col R? 

33. If Q is a 4 x 4 matrix and Col Q = R 4 , what can you say about 
solutions of equations of the form Qx = b for b in R 4 ? 

34. IfP is a 5 x 5 matrix and Nul P is the zero subspace, what 
can you say about solutions of equations of the form Px = b 
for b in M 5 ? 

35. What can you say about Nul B when B is a 5 x 4 matrix with 
linearly independent columns? 

36. What can you say about the shape of an m x n matrix A when 
the columns of A form a basis for IR W ? 


[M] In Exercises 37 and 38, construct bases for the column space 
and the null space of the given matrix A. Justify your work. 


37. A = 


38. A = 










-8 

-9 

19 

5 


WEB 


Column Space and Null Space 


WEB 


A Basis for Col A 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. To determine whether u is in Nul A, simply compute 



1 

-1 

5 


-7 


0 

An = 

2 

0 

7 


3 

— 

0 


-3 

-5 

-3 


2 


0 


The result shows that u is in Nul A . Deciding whether u is in Col A requires more 
work. Reduce the augmented matrix [A u] to echelon form to determine whether 
the equation Ax = u is consistent: 


1 

-1 

5 

-7 


"1 

-1 

5 

-7 


"1 

-1 

5 

-7 

2 

0 

7 

3 


0 

2 

-3 

17 


0 

2 

-3 

17 

-3 

-5 

-3 

2 


0 

00 

12 

vo 

1 _ 


0 

0 

0 

49 


The equation Ax = u has no solution, so u is not in Col A. 

2. In contrast to Practice Problem 1, finding a vector in Nul A requires more work 
than testing whether a specified vector is in Nul A . However, since A is already 
in reduced echelon form, the equation Ax = 0 shows that if x = (x\,X 2 , X 3 ), then 
X 2 = 0, X 3 = 0, and X\ is a free variable. Thus, a basis for Nul A is v = (1,0,0). 
Finding just one vector in Col A is trivial, since each column of A is in Col A. In 
this particular case, the same vector v is in both Nul A and Col A. For most n x n 
matrices, the zero vector of W 1 is the only vector in both Nul A and Col A. 

3. If A is invertible, then the columns of A span M /? , by the Invertible Matrix Theorem. 
By definition, the columns of any matrix always span the column space, so in this 
case Col A is all of M /? . In symbols, Col A = M 7? . Also, since A is invertible, the 
equation Ax = 0 has only the trivial solution. This means that Nul A is the zero 
subspace. In symbols, Nul A = {0}. 


2.9 DIMENSION AND RANK 


This section continues the discussion of subspaces and bases for subspaces, beginning 
with the concept of a coordinate system. The definition and example below should make 
a useful new term, dimension , seem quite natural, at least for subspaces of M 3 . 

Coordinate Systems 

The main reason for selecting a basis for a subspace //, instead of merely a spanning 
set, is that each vector in H can be written in only one way as a linear combination of 
the basis vectors. To see why, suppose B = {bi,..., b p } is a basis for //, and suppose 
a vector x in H can be generated in two ways, say, 

x = cibi + • • • + c p bp and x = d\b\ + • • • + d p b p (1) 

Then, subtracting gives 

0 = x — x = (c\ - di)b\ H- \-(c p — d p )h p (2) 

Since B is linearly independent, the weights in (2) must all be zero. That is, Cj = dj for 
1 <j < p , which shows that the two representations in (1) are actually the same. 
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DEFINITION 


Suppose the set B = {bi,...,b^} is a basis for a subspace H. For each x in H, 
the coordinates of x relative to the basis B are the weights c\,... ,c p such that 
x = c \bi + • • • + c p b p , and the vector in R p 





is called the coordinate vector of x (relative to B) or the ^-coordinate vector 
of x. 1 



3 


"-I" 


3 


EXAMPLE 1 Letvi = 

6 

2 

, v 2 = 

0 

1 

, x = 

12 

7 

, and B = {vi, V 2 }. Then 


B is a basis for H = Span {vi, V 2 } because Vi and V 2 are linearly independent. Deter¬ 
mine if x is in H , and if it is, find the coordinate vector of x relative to B. 


SOLUTION If x is in H , then the following vector equation is consistent: 



3 


"-I" 


3 

6 

+ Cl 

0 

— 

12 

2 


1 


7 


The scalars c\ and C 2 , if they exist, are the ^-coordinates of x. Row operations show 
that 


—1 

CO 

-1 

CO 

_1 


"1 

0 

1— 

<N 

6 

0 

12 


0 

1 

3 

<N 

_1 

1 

<1 

1_ 


0 

0 

0 


2 

3 


. The basis B determines a “coordinate system” 


on H , which can be visualized by the grid shown in Figure 1. 



FIGURE 1 A coordinate system on a plane 

H inIR 3 . 


1 It is important that the elements of B are numbered because the entries in [x] b depend on the order of the 
vectors in B. 
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DEFINITION 


DEFINITION 


Notice that although points in H are also in M 3 , they are completely determined 
by their coordinate vectors, which belong to M 2 . The grid on the plane in Figure 1 
makes H “look” like M 2 . The correspondence [ x \ B is a one-to-one correspondence 

between H and M 2 that preserves linear combinations. We call such a correspondence 
an isomorphism , and we say that H is isomorphic to M 2 . 

In general, if B = {bi,..., b^} is a basis for H , then the mapping [x]# is a 
one-to-one correspondence that makes H look and act the same as (even though the 
vectors in H themselves may have more than p entries). (Section 4.4 has more details.) 

The Dimension of a Subspace 

It can be shown that if a subspace H has a basis of p vectors, then every basis of H must 
consist of exactly p vectors. (See Exercises 27 and 28.) Thus the following definition 
makes sense. 

The dimension of a nonzero subspace H , denoted by dim H , is the number of 
vectors in any basis for H . The dimension of the zero subspace {0} is defined to 
be zero . 2 


The space M 7? has dimension n. Every basis for W 1 consists of n vectors. A plane 
through 0 in M 3 is two-dimensional, and a line through 0 is one-dimensional. 

EXAM PLE 2 Recall that the null space of the matrix A in Example 6 in Section 2.8 
had a basis of 3 vectors. So the dimension of Nul A in this case is 3. Observe how each 
basis vector corresponds to a free variable in the equation Ax = 0. Our construction 
always produces a basis in this way. So, to find the dimension of Nul A, simply identify 
and count the number of free variables in Ax = 0. ■ 

The rank of a matrix A , denoted by rank A , is the dimension of the column space 
of A. 


Since the pivot columns of A form a basis for Col A , the rank of A is just the number 
of pivot columns in A . 


EXAMPLE 3 Determine the rank of the matrix 



SOLUTION Reduce A to echelon form: 


A ~ 


"2 

5 

-3 

-4 

8 


"2 

5 

-3 

-4 

8 

0 

-3 

2 

5 

-7 


0 

-3 

2 

5 

-7 

0 

-6 

4 

14 

-20 


0 

0 

0 

4 

-6 

0 

-9 

6 

5 

-6 


0 

0 

0 

0 

0 





Pivot columns 

t 

L 





The matrix A has 3 pivot columns, so rank A 



2 


The zero subspace has no basis (because the zero vector by itself forms a linearly dependent set). 
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THEOREM 14 


THEOREM 15 


THEOREM 


The row reduction in Example 3 reveals that there are two free variables in Ax = 0, 
because two of the five columns of A are not pivot columns. (The nonpivot columns 
correspond to the free variables in Ax = 0.) Since the number of pivot columns plus the 
number of nonpivot columns is exactly the number of columns, the dimensions of Col A 
and Nul A have the following useful connection. (See the Rank Theorem in Section 4.6 
for additional details.) 


The Rank Theorem 

If a matrix A has n columns, then rank A + dim Nul A = n . 


The following theorem is important for applications and will be needed in 
Chapters 5 and 6. The theorem (proved in Section 4.5) is certainly plausible, if you 
think of a -dimensional subspace as isomorphic to . The Invertible Matrix Theorem 
shows that p vectors in are linearly independent if and only if they also span . 


The Basis Theorem 

Let H be a -dimensional subspace of W 1 . Any linearly independent set of exactly 
p elements in H is automatically a basis for H. Also, any set of p elements of H 
that spans H is automatically a basis for H. 


Rank and the Invertible Matrix Theorem 

The various vector space concepts associated with a matrix provide several more 
statements for the Invertible Matrix Theorem. They are presented below to follow the 
statements in the original theorem in Section 2.3. 


The Invertible Matrix Theorem (continued) 

Let A be an n x n matrix. Then the following statements are each equivalent to 
the statement that A is an invertible matrix. 


m. The columns of A form a basis of 



n 


n. Col A = W 1 

o. dim Col A = n 

p. rank A = n 

q. Nul A = {0} 

r. dim Nul A = 0 


PROOF Statement (m) is logically equivalent to statements (e) and (h) regarding linear 
independence and spanning. The other five statements are linked to the earlier ones of 
the theorem by the following chain of almost trivial implications: 

(g) => (n) => (o) =>■ (p) => (r) => (q) =*> (d) 

Statement (g), which says that the equation Ax = b has at least one solution for each 
b in M /z , implies statement (n), because Col A is precisely the set of all b such that 
the equation Ax = b is consistent. The implications (n) =4> (o) =4> (p) follow from the 
definitions of dimension and rank. If the rank of A is n , the number of columns of A , 
then dim Nul A = 0, by the Rank Theorem, and so Nul A = {0}. Thus (p) =4> (r) =4> (q). 
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Expanded Table 
fortheIMT 2-39 


Also, statement (q) implies that the equation Ax = 0 has only the trivial solution, which 
is statement (d). Since statements (d) and (g) are already known to be equivalent to the 
statement that A is invertible, the proof is complete. ■ 


WEB 


NUMERICAL NOTES 


Many algorithms discussed in this text are useful for understanding concepts 
and making simple computations by hand. However, the algorithms are often 
unsuitable for large-scale problems in real life. 

Rank determination is a good example. It would seem easy to reduce a matrix 
to echelon form and count the pivots. But unless exact arithmetic is performed 
on a matrix whose entries are specified exactly, row operations can change the 

5 7 

apparent rank of a matrix. For instance, if the value of x in the matrix 


5 


x 


is not stored exactly as 7 in a computer, then the rank may be 1 or 2, depending 
on whether the computer treats x — 7 as zero. 

In practical applications, the effective rank of a matrix A is often determined 
from the singular value decomposition of A, to be discussed in Section 7.4. 


PRACTICE PROBLEMS 


1. Determine the dimension of the subspace H of M 3 spanned by the vectors Vi, V 2 , 
and V 3 . (First, find a basis for H .) 



1 

<N 

1_ 


3 


1— 

1_ 

Vl = 

1 

OS OO 

1_ 

, V 2 = 

1 

'T T 

_ 1 

, V 3 = 

6 

-7 


2. Consider the basis 




for M 2 . If [ x ] 


B 


3 

2 


, what is x? 


Could M 3 possibly contain a four-dimensional subspace? Explain 


2.9 EXERCISES 


In Exercises 1 and 2, find the vector x determined by the given 
coordinate vector \x\& and the given basis B. Illustrate your 
answer with a figure, as in the solution of Practice Problem 2. 


1 . B = 





2 . B = 



3. hi = 

1 1 

r-H ^ 

1_1 

, b 2 = 

~-2" 

7 

,x = 

1 1 

1 _ 1 

4. b[ = 

1 " 
-3 

, b 2 = 

"-3" 

5 

,x = 

"-7" 

5 



1 _ 


~-3“ 


1 

1 _ 

5. b[ = 

5 

, b 2 = 

-7 

,x = 

10 


_ —3 _ 


5 _ 


_ —7 _ 


~-3“ 


7" 


'll" 

6. bi = 

1 

•>2 = 

5 

,X = 

0 


-4 


1 

VO 

_ 1 


7 


In Exercises 3-6, the vector x is in a subspace H with a basis 
B — { bi , b 2 }. Find the B-coordinate vector of x. 
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7. Let bi = 


"3" 

_ 0 _ 

, b 2 = 

"-1 " 

2 

, w = 

7" 

-2 

,x = 

"4" 

1 


and 


B = {b],b 2 }. Use the figure to estimate [w]& and [x] B . 
Confirm your estimate of [x]# by using it and {bi,b 2 } to 
compute x. 



8. Let bi = 


0 

2 


bi = 


2 

1 


,x = 


2 

3 


»y = 


2 

4 


z = 


-1 

-2.5 


, and B = {bi, b 2 }. Use the figure to estimate 


[x] B , [y ] B , and [z] B . Confirm your estimates of \y] B and [z] B 
by using them and {bi, b 2 } to compute y and z. 



Exercises 9-12 display a matrix A and an echelon form of A. Find 
bases for Col A and Nul A, and then state the dimensions of these 
subspaces. 


9. A = 


1 

-3 

2 

-4" 


" 1 

-3 

2 

-4" 

-3 

9 

-1 

5 


0 

0 

5 

-7 

2 

-6 

4 

-3 


0 

0 

0 

5 

_1 

12 

2 

7 


0 

0 

0 

0 


10. A = 




1 

1 

-2 

4 

1 

0 

0 

0 


-2 

-1 

0 

1 


-2 

1 

0 

0 


9 

6 

-6 

9 

9 

3 

0 

0 


5 

5 

1 

1 


4 

-3 

-2 

-9 


5 

0 

1 

0 


4 

-7 

-2 

0 


11. A = 


1 

2 

3 

3 




1 

0 

0 

0 


2 

1 

0 

0 


12. A = 


1 

5 

4 

2 


2-5 0 

5-8 4 

-9 9 -7 

10 -7 11 

-5 0 

2 4 

0 1 
0 0 

2-4 3 

10 -9 -7 

8 -9 -2 

-4 5 0 


-1 

3 

-2 

7 


-1 

5 

2 

0 


3 

8 

7 

-6 




1 

0 

0 

0 


2 

0 

0 

0 


4 

1 

0 

0 


3 

-2 

0 

0 


3 

0 

-5 

0 


In Exercises 13 and 14, find a basis for the subspace spanned by 
the given vectors. What is the dimension of the subspace? 


13. 


14. 


1 


-3 


1 

<N 

1_ 


1- 

1_ 

-3 


9 


-1 


5 

2 

•> 

-6 

5 

4 


-3 

i 

_i 


i 

CN 

_1 


i 

<N 

_1 


7 

1 


1 

<N 

1_ 


1 

O 

1_ 


-1 

-1 


-3 


2 


4 

-2 

•> 

-1 

5 

-6 

9 

-7 

i 

1_ 


i 

_1 


i 

oo 

_1 


7 


3 

-8 

9 

-5 


15. Suppose a 3 x 5 matrix A has three pivot columns. Is Col 
A = M 3 ? Is Nul A = R 2 ? Explain your answers. 

16. Suppose a 4 x 7 matrix A has three pivot columns. Is Col 
A = R 3 ? What is the dimension of Nul A? Explain your 
answers. 

In Exercises 17 and 18, mark each statement True or False. Justify 
each answer. Here A is an m x n matrix. 

17. a. If B = {vi,..., v^} is a basis for a subspace H and if 

x = ciVi + • • • + c D v D , then c \,..., c p are the coordi¬ 


nates of x relative to the basis B. 


b. Each line in M" is a one-dimensional subspace of 



n 


c. The dimension of Col A is the number of pivot columns 
of A. 

d. The dimensions of Col A and Nul A add up to the number 
of columns of A. 

e. If a set of p vectors spans a /^-dimensional subspace H of 
M", then these vectors form a basis for H. 

18. a. If B is a basis for a subspace //, then each vector in H can 

be written in only one way as a linear combination of the 
vectors in B. 

b. If B = { Vi,..., y p } is a basis for a subspace H of R", then 
the correspondence x [ x ]& makes H look and act the 
same as IRC 
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c. The dimension of Nul A is the number of variables in the 
equation Ax = 0. 

d. The dimension of the column space of A is rank A. 

e. If H is a p-dimensional subspace of R", then a linearly 
independent set of p vectors in H is a basis for H. 

In Exercises 19-24, justify each answer or construction. 

19. If the subspace of all solutions of Ax = 0 has a basis con¬ 
sisting of three vectors and if A is a 5 x 7 matrix, what is the 
rank of A? 

20. What is the rank of a 4 x 5 matrix whose null space is three- 
dimensional? 

21. If the rank of a 7 x 6 matrix A is 4, what is the dimension of 
the solution space of Ax = 0? 

22. Show that a set of vectors {vi,v 2 , ...,v 5 } in R' 1 is linearly 
dependent when dim Span {vi, v 2 ,..., v 5 } = 4. 

23. If possible, construct a 3 x 4 matrix A such that dim 
Nul A = 2 and dim Col A = 2. 

24. Construct a 4 x 3 matrix with rank 1. 

25. Let A be an n x p matrix whose column space is p- 
dimensional. Explain why the columns of A must be linearly 
independent. 

26. Suppose columns 1, 3, 5, and 6 of a matrix A are linearly 
independent (but are not necessarily pivot columns) and the 
rank of A is 4. Explain why the four columns mentioned must 
be a basis for the column space of A. 


27. Suppose vectors bi,...,b /7 span a subspace W, and let 
{ai,...,a^} be any set in W containing more than p 
vectors. Fill in the details of the following argument to 
show that {ai, ..., a q } must be linearly dependent. First, let 
B = [bi ••• b p ] and A = [sl\ ••• a q ]. 

a. Explain why for each vector a 7 , there exists a vector c 7 
in R p such that a y = Bcj . 

b. Let C = [Ci ••• c,]. Explain why there is a nonzero 
vector u such that C u = 0. 

c. Use B and C to show that Au = 0. This shows that the 
columns of A are linearly dependent. 

28. Use Exercise 27 to show that if A and B are bases for a 
subspace W of R' 1 , then A cannot contain more vectors than 
B, and, conversely, B cannot contain more vectors than A. 

29. [M] Let H = Span {vi, v 2 } and B - {vi, v 2 }. Show that x is 
in //, and find the ^-coordinate vector of x, when 



' 11" 


" 14" 


19" 


-5 


-8 


-13 

Vi = 

10 

, v 2 = 

13 

,x = 

18 


7 


10 


15 


30. [M] Let H = Span{vi, v 2 , v 3 } and B = {vi, v 2 , v 3 }. Show that 
B is a basis for H and x is in //, and find the ^-coordinate 
vector of x, when 



"-6" 


8 " 


"-9" 


4“ 


4 


-3 


5 


7 

Vi = 

-9 

, v 2 = 

7 

,v 3 = 

-8 

,x = 

-8 


4 


-3 


3 


3 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. Construct A = [vi v 2 V3] so that the subspace spanned by Vi, v 2 , V3 is the column 
space of A. A basis for this space is provided by the pivot columns of A. 



<N 

1 _ 

3 

- 1 " 


<N 

1 _ 

3 

- 1 " 


<N 

1 _ 

3 

- 1 " 

A = 

OO 

-7 

6 


0 

5 

2 


0 

5 

2 


^0 

_ 1 

-1 

-7 


0 - 

-10 

-4 


0 

0 

0 


The first two columns of A are pivot columns and form a basis for H. Thus 
dim H = 2. 


2. If [x] B 


3 

2 


, then x is formed from a linear combination of the basis vectors using 


weights 3 and 2: 


x = 3bi + 2b 2 = 3 

The basis {bi, b 2 } determines a coordinate system for M 2 , illustrated by the grid in 
the figure. Note how x is 3 units in the bi-direction and 2 units in the b 2 -direction. 
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3. A four-dimensional subspace would contain a basis of four linearly independent 
vectors. This is impossible inside M 3 . Since any linearly independent set in M 3 has 
no more than three vectors, any subspace of M 3 has dimension no more than 3. The 
space M 3 itself is the only three-dimensional subspace of M 3 . Other subspaces of M 3 
have dimension 2, 1, or 0. 


CHAPTER 2 SUPPLEMENTARY EXERCISES 


1. Assume that the matrices mentioned in the statements below 

have appropriate sizes. Mark each statement True or False. 

Justify each answer. 

a. If A and B are m x n, then both AB T and A T B are 
defined. 

b. If AB = C and C has 2 columns, then A has 2 columns. 

c. Left-multiplying a matrix B by a diagonal matrix A , with 
nonzero entries on the diagonal, scales the rows of B. 

d. If BC = BD, then C — D. 

e. If AC = 0, then either A = 0 or C = 0. 

f. If A and B are/? x n, then ( A + B)(A — B) = A 2 — B 2 . 

g. An elementary n x n matrix has either n or n + 1 
nonzero entries. 

h. The transpose of an elementary matrix is an elementary 
matrix. 


i. An elementary matrix must be square. 

j. Every square matrix is a product of elementary matrices. 

k. If A is a 3 x 3 matrix with three pivot positions, 
there exist elementary matrices E\,...,E p such that 
E p • • • E\ A — I. 


l. If AB = /, then A is invertible. 

m. If A and B are square and invertible, then AB is invert¬ 
ible, and ( AB)~ l = A~ l B~ l . 

n. If AB = BA and if A is invertible, then A~ l B = BA~ l . 

o. If A is invertible and if r ^ 0, then ( rA)~ l = rA~ l . 


p. If A is a 3 x 3 matrix and the equation Ax 
a unique solution, then A is invertible. 




2. Find the matrix C whose inverse is C 1 = 


4 

6 



3. Let A = 


0 0 0 
1 0 0 


0 


1 0 


. Show that A 3 = 0. Use matrix 


algebra to compute the product (/ — A)(I + A + A 2 ). 


4. Suppose A n = 0 for some n > 1. Find an inverse for / — A. 

5. Suppose an n x n matrix A satisfies the equation 
A 2 — 2A -\-1 = 0. Show that A 3 = 3A — 21 and 
A 4 = 4A-3I. 


6. Let A = 


1 

0 


0 

-1 


, B = 


. These are Pauli spin 


0 1 
1 0 

matrices used in the study of electron spin in quantum 
mechanics. Show that A 2 = /, B 2 = I , and AB — —BA. 
Matrices such that AB = —BA are said to anticommute. 


7. Let A = 


"1 

3 

8" 


"-3 

5" 

2 

4 

11 

and B = 

1 

5 

1 

2 

5 


3 

4 


. Compute 


of the equation AX = B .] 


8. Find a matrix A such that the transformation x i-> Ax maps 


1 

3 


and 


2 

7 


into 


1 

1 


and 


3 

1 


, respectively. [Hint: Write 


a matrix equation involving A , and solve for A .] 


9. Suppose AB = 


5 

2 


4 

3 


and B = 


7 

2 


3 

1 


. Find A. 


10. Suppose A is invertible. Explain why A T A is also invertible. 
Then show that A~ l = (A T A)~ l A T . 


11. Let 3C i ^ ^ j ^ be fixed numbers. The matrix below, called 

a Vandermonde matrix , occurs in applications such as 
signal processing, error-correcting codes, and polynomial 
interpolation. 

1 X\ x\ • • • x'l~ l 

1 x 2 xi • • • Xy~ l 

V = 

r • • • • 

• • • • 

• • • • 

_1 X n x l ■■■ 

Given y = (jq,..., y n ) in W 7 , suppose c = (c 0 ,..., c n —\) in 
R" satisfies Uc = y, and define the polynomial 

pit) = Cq + C\t + C 2 t 2 + ' ' • + C n —\t n h 

a. Show that p(x\) = y \,..., p(x n ) = y n . We call 
/?(/) an interpolating polynomial for the points 
(xi, ji),..., (x n , y„) because the graph of pit) passes 
through the points. 

b. Suppose xi,..., x n are distinct numbers. Show that the 
columns of V are linearly independent. [Hint: How many 
zeros can a polynomial of degree n — 1 have?] 

c. Prove: “If X\,... ,x n are distinct numbers, and y\,... ,y n 
are arbitrary numbers, then there is an interpolating poly¬ 
nomial of degree < n — 1 for (xi, jq),..., (x n , y n ).” 

12. Let A = LU , where L is an invertible lower triangular ma¬ 
trix and U is upper triangular. Explain why the first column 
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of A is a multiple of the first column of L. How is the second 
column of A related to the columns of L? 


13. Given u in R" with u r u = 1 , let P = uu r (an outer product) 
and Q = I — 2P. Justify statements (a), (b), and (c). 

a . P 2 = P b . P T = P c. Q 2 = I 

The transformation x\->Px is called a projection , and 
x i-> Qx is called a Householder reflection. Such reflections 
are used in computer programs to create multiple zeros in a 
vector (usually a column of a matrix). 



" 0 " 


" 1 " 

14. Let u = 

0 

and x = 

5 


1 


3 


. Determine P and 0 as in 


Exercise 13, and compute Px and Qx. The figure shows that 
Qx is the reflection of x through the X\X 2 -plane. 



A Householder reflection through the plane 

x 3 = 0. 


15. Suppose C = £ 3 E 2 EiB, where E\ , E 2 , and E 3 are elemen¬ 
tary matrices. Explain why C is row equivalent to B. 

16. Let A be an n x n singular matrix. Describe how to construct 
an n x n nonzero matrix B such that AB = 0. 

17. Let A be a 6 x 4 matrix and B a 4 x 6 matrix. Show that the 
6 x 6 matrix AB cannot be invertible. 

18. Suppose A is a 5 x 3 matrix and there exists a 3 x 5 matrix 
C such that CA = / 3 . Suppose further that for some given b 
in IR 5 , the equation Ax = b has at least one solution. Show 
that this solution is unique. 

19. [M] Certain dynamical systems can be studied by examining 
powers of a matrix, such as those below. Determine what 
happens to A k and B k as k increases (for example, try 
k = 2,..., 16). Try to identify what is special about A and 
B. Investigate large powers of other matrices of this type, and 
make a conjecture about such matrices. 



" .4 

.2 

.3" 


- 0 

.2 

.3" 

A = 

.3 

.6 

.3 

, B = 

.1 

.6 

.3 


.3 

.2 

.4 


.9 

.2 

.4 


20. [M] Let A n be the n xn matrix with 0’s on the main diagonal 
and l’s elsewhere. Compute A~ l for n = 4, 5, and 6 , and 
make a conjecture about the general form of A~ l for larger 
values of n. 
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WEB 

INTRODUCTORY EXAMPLE 

Random Paths and Distortion 

In his autobiographical book, Surely You’re Joking, 
Mr. Feynman , the Nobel Prize-winning physicist Richard 
Feynman tells of observing ants in his Princeton graduate 
school apartment. He studied the ants’ behavior by pro¬ 
viding paper ferries to sugar suspended on a string where 
the ants would not accidentally find it. When an ant would 
step onto a paper ferry, Feynman would transport the ant 
to the food and then back. After the ants learned to use 
the ferry, he relocated the return landing. The colony soon 
confused the outbound and return ferry landings, indicating 
that their “learning” consisted of creating and following 
trails. Feynman confirmed this conjecture by laying glass 
slides on the floor. Once the ants established trails on the 
glass slides, he rearranged the slides and therefore the 
trails on them. The ants followed the repositioned trails 
and Feynman could direct the ants where he wished. 

Suppose Feynman had decided to conduct additional 
investigations using a globe built of wire mesh on which 
an ant must follow individual wires and choose between 
going left and right at each intersection. If several ants and 
an equal number of food sources are placed on the globe, 
how likely is it that each ant would find its own food source 
rather than encountering another ant’s trail and following 
it to a shared resource? 1 

1 The solution to the ant-path problem (and two other applications) can 
be found in a June 2005 Mathematical Monthly article by Arthur 
Benjamin and Naomi Cameron. 



In order to record the actual routes of the ants and to 
communicate the results to others, it is convenient to use 
a rectangular map of the globe. There are many ways to 
create such maps. One simple way is to use the longitude 
and latitude on the globe as x and y coordinates on the map. 
As is the case with all maps, the result is not a faithful 
representation of the globe. Features near the “equator” 
look much the same on the globe and the map, but regions 
near the “poles” of the globe are distorted. Images of polar 
regions are much larger than the images of similar sized 
regions near the equator. To fit in with its surroundings on 
the map, the image of an ant near one of the poles should 
be larger than one near the equator. How much larger? 

Surprisingly, both the ant-path and the area distortion 
problems are best answered through the use of the deter¬ 
minant, the subject of this chapter. Indeed, the determinant 
has so many uses that a summary of the applications known 
in the early 1900’s filled a four-volume treatise by Thomas 
Muir. With changes in emphasis and the greatly increased 
sizes of the matrices used in modern applications, many 
uses that were important then are no longer critical today. 
Nevertheless, the determinant still plays an important role. 
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Beyond introducing the determinant in Section 3.1, this chapter presents two important 
ideas. Section 3.2 derives an invertibility criterion for a square matrix that plays a pivotal 
role in Chapter 5. Section 3.3 shows how the determinant measures the amount by which 
a linear transformation changes the area of a figure. When applied locally, this technique 
answers the question of a map’s expansion rate near the poles. This idea plays a critical 
role in multivariable calculus in the form of the Jacobian. 


3.1 INTRODUCTION TO DETERMINANTS 


Recall from Section 2.2 that a 2 x 2 matrix is invertible if and only if its determinant 
is nonzero. To extend this useful fact to larger matrices, we need a definition for the 
determinant of an n x n matrix. We can discover the definition for the 3x3 case by 
watching what happens when an invertible 3x3 matrix A is row reduced. 

Consider A = [aij] with ci\ \ ^ 0. If we multiply the second and third rows of A by 
a ii and then subtract appropriate multiples of the first row from the other two rows, we 
find that A is row equivalent to the following two matrices: 


a n 

a n 

*213 


*211 

*212 

an 


*211*221 

*211*222 

*2h*223 


0 

*2h*222 " 

- a 12*221 

*211*223 — *213*221 

(i) 

_*2i 1*231 

*211*232 

*211*233 _ 


0 

*211*232 " 

- a 12*231 

*211*233 — *213(231 _ 



Since A is invertible, either the (2, 2)-entry or the (3, 2)-entry on the right in (1) is 
nonzero. Let us suppose that the (2, 2)-entry is nonzero. (Otherwise, we can make a 
row interchange before proceeding.) Multiply row 3 by 1^22 — * 212 * 221 , and then to the 
new row 3 add —(< 211*232 — * 212 * 231 ) times row 2. This will show that 


where 



*2l2 

* 211*222 — * 212*221 

0 


*2l3 

*211*223 ~ *213*221 

* 211 A 


A — *211*222*233 + *212*223*231 + *213*221*232 — *211*223*232 — *212*221*233 — *213*222*231 (2) 

Since A is invertible, A must be nonzero. The converse is true, too, as we will see in 
Section 3.2. We call A in (2) the determinant of the 3x3 matrix A. 

Recall that the determinant of a 2 x 2 matrix, A = [* 2 ], is the number 

det v 4 = *211*222 -*212*221 


For a 1 x 1 matrix—say, A = [* 2 n] —we define detv4 = * 211 . To generalize the defini¬ 
tion of the determinant to larger matrices, we’ll use 2 x 2 determinants to rewrite the 
3x3 determinant A described above. Since the terms in A can be grouped as 

(*211*222*233 —*211*223*232) — (*212*221*233 —*212*223*231) + (*213*221*232 — *213*222*231), 




* 2 n • det 


*222 

*232 


*223 

*233 


— a 12 • det 


*221 

*231 


*223 

*233 


+ *213 • det 


*221 

*231 


*222 

*232 


For brevity, write 


A — *211 • det An — *212 • det v 4 i 2 T *213 • det v 4 i 3 



where An , An, and v 4 i 3 are obtained from A by deleting the first row and one of the 
three columns. For any square matrix A , let Ajj denote the submatrix formed by deleting 
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DEFINITION 


the i th row and j th column of A. For instance, if 





then A 32 is obtained by crossing out row 3 and column 2, 


so that 







We can now give a recursive definition of a determinant. When n = 3, det A is defined 
using determinants of the 2x2 submatrices Ay , as in (3) above. When n = 4, det A 
uses determinants of the 3x3 submatrices A y . In general, an n x n determinant is 
defined by determinants of (ft — 1 ) x (n — 1 ) submatrices. 


For n > 2, the determinant of an n x n matrix A = [ft y ] is the sum of ft terms 
of the form ±ay det Ay , with plus and minus signs alternating, where the entries 
an , ft 12, • • •, a\n are from the first row of A. In symbols, 

det A = an det An — an det An + • • • + (— i) l+n a\ n det A\ n 

n 

= yy-l) 1+; «i; det A 1 j 
j =1 


EXAMPLE 1 Compute the determinant of 




SOLUTION Compute det A = an det An — an det An + ^13 det A i3 : 


det A = 1 • det 


4 -1 
2 0 


5 • det 


2 -1 

0 0 


+ 0 • det 


2 

0 


1(0 - 2) - 5(0 - 0) + 0(—4 - 0) 


■2 



Another common notation for the determinant of a matrix uses a pair of vertical 
lines in place of brackets. Thus the calculation in Example 1 can be written as 


det A = 1 






To state the next theorem, it is convenient to write the definition of det A in a slightly 
different form. Given A = [ay], the (/, j) -cofactor of A is the number Cy given by 



= (-l) ,+ '' det Ajj 



Then 


det A — ftiiCn + anCn + • ■ ■ + CL\ n C\ n 
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This formula is called a cofactor expansion across the first row of A . We omit the 
proof of the following fundamental theorem to avoid a lengthy digression. 


THEOREM 1 


The determinant of an n x n matrix A can be computed by a cofactor expansion 
across any row or down any column. The expansion across the i th row using the 
cofactors in (4) is 

detv4 = anCn + $/ 2 G 2 + • • • + $/ w C / ;7 


The cofactor expansion down the j th column is 



The plus or minus sign in the (/, y )-cofactor depends on the position of a\j in the 
matrix, regardless of the sign of $ 7 y itself. The factor (—l) z+y determines the following 
checkerboard pattern of signs: 


+ 

+ 

+ 

— 

+ 

+ 



EXAMPLE 2 Use a cofactor expansion across the third row to compute det A , where 


A 


1 5 0 

2 4-1 
0-2 0 


SOLUTION Compute 


detv4 — $ 31 C 31 + $32^32 + $ 33^33 

= (—l) 3 + 1 $ 3 i det ^31 + (—1) 3+2 $32 det ^32 + (— 1) 3+3 $33 det ^33 


0 


5 

4 


0 

1 


(- 2 ) 


1 

2 


0 

1 


+ 0 


1 

2 


5 

4 


0 + 2 (— 1 ) + 0 


2 


Theorem 1 is helpful for computing the determinant of a matrix that contains many 
zeros. For example, if a row is mostly zeros, then the cofactor expansion across that row 
has many terms that are zero, and the cofactors in those terms need not be calculated. 
The same approach works with a column that contains many zeros. 


EXAMPLE 3 Compute det A , where 





SOLUTION The cofactor expansion down the first column of A has all terms equal to 
zero except the first. Thus 


2-573 
0 15 0 
0 2 4 -1 
0 0-20 


det A = 3 • 


+ 0 • C21 + 0 • C31 + 0 • C41 + 0 • C51 
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Henceforth we will omit the zero terms in the cofactor expansion. Next, expand this 
4x4 determinant down the first column, in order to take advantage of the zeros there. 
We have 


det A = 3 • 2 • 


1 5 0 

2 4-1 
0-2 0 


This 3 x 3 determinant was computed in Example 1 and found to equal —2. Hence 
det v4 = 3 - 2 - (—2) = -12. ■ 


The matrix in Example 3 was nearly triangular. The method in that example is easily 
adapted to prove the following theorem. 


THEOREM 2 


If A is a triangular matrix, then detv4 is the product of the entries on the main 
diagonal of A. 


The strategy in Example 3 of looking for zeros works extremely well when an entire 
row or column consists of zeros. In such a case, the cofactor expansion along such a row 
or column is a sum of zeros! So the determinant is zero. Unfortunately, most cofactor 
expansions are not so quickly evaluated. 

i— NUMERICAL NOTE - 

By today’s standards, a 25 x 25 matrix is small. Yet it would be impossible 
to calculate a 25 x 25 determinant by cofactor expansion. In general, a cofac¬ 
tor expansion requires more than n\ multiplications, and 25! is approximately 
1.5 x 10 25 . 

If a computer performs one trillion multiplications per second, it would have 
to run for more than 500,000 years to compute a 25 x 25 determinant by this 
method. Fortunately, there are faster methods, as we’ll soon discover. 


Exercises 19-38 explore important properties of determinants, mostly for the 2x2 
case. The results from Exercises 33-36 will be used in the next section to derive the 
analogous properties for n x n matrices. 

PRACTICE PROBLEM 

2 2 
0 -4 
0 3 ' 

0 -6 


Compute 


5 

0 

5 

0 


7 
3 

8 
5 


3.1 EXERCISES 


Compute the determinants in Exercises 1-8 using a cofactor 
expansion across the first row. In Exercises 1-4, also compute the 
determinant by a cofactor expansion down the second column. 



3 

0 

4 


0 

4 

1 

1. 

2 

3 

2 

2. 

5 

-3 

0 


0 

5 

-1 


2 

3 

1 



2 

-2 

3 


1 

2 

4 

3. 

3 

1 

2 

4. 

3 

1 

1 


1 

3 

-1 


2 

4 

2 


2 

3 

-3 


5 

-2 

2 

5. 

4 

0 

3 

6. 

0 

3 

-3 


6 

1 

5 


2 

-4 

7 
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7. 


4 

6 

9 


3 

5 

7 


0 

2 

3 


8 . 


4 1 
4 0 
3 -2 


2 

3 

5 


20 


Compute the determinants in Exercises 9-14 by cofactor expan¬ 
sions. At each step, choose a row or column that involves the least 
amount of computation. 


21 


22 


9. 


11 . 


4 

0 

0 

5 


1 

-2 

5 

2 

1 

7 

2 

-5 

10. 

0 

0 

3 

0 

3 

0 

0 

0 

2 

-4 

-3 

5 

OC 

3 

1 

7 


2 

0 

3 

5 

3 

5 

-6 

4 


3 

0 

0 

0 

0 

-2 

3 

-3 

12. 

7 

-2 

0 

0 

0 

0 

1 

5 

2 

6 

3 

0 

0 

0 

0 

3 


3 

-8 

4 

-3 


23 


a 

c 

a 

c 

3 

5 

a 

3 

4 


b 

d 

b 

d 

2 

4 

b 

2 

5 



a + kc 

b + kd 

5 

c 

d 

- 

a b 



kc kd 


—i 

3 

2 

5 

5 + 3k 4 + 2 k _ 


c 

1 

6 


3 
a 

4 


2 

b 

5 


1 

c 

6 


24 


1 0 1 
-3 4 -4 


2 -3 


1 



1_ 

0 

i 

5 

-3 

4 

-4 


<N 

_1 

-3 

1 _ 


Compute the determinants of the elementary matrices given in 


13 


14 


4 

0 

-7 

3 

-5 

Exercises 25-30. 

(See Section 2.2.) 




0 

0 

2 

0 

0 


" 1 

0 

0" 


"0 

0 

1" 

7 

3 

-6 

4 

-8 

25. 

0 

1 

0 

26. 

0 

1 

0 

5 

0 

5 

2 

-3 


_0 

k 

1 _ 


_1 

0 

0_ 

0 

0 

9 

-1 

2 















" 1 

0 

0" 


k 

0 

0" 

6 

3 

2 

4 

0 

27. 

0 

1 

0 

28. 

0 

1 

0 

9 

0 

-4 

1 

0 


k 

0 

1 _ 


_0 

0 

1_ 

8 

-5 

6 

7 

1 









2 

0 

0 

0 

0 


" 1 

0 

0" 


"0 

1 

0" 

4 

2 

3 

2 

0 

29. 

0 

k 

0 

30. 

1 

0 

0 







0 

0 

1 


0 

0 

1 


The expansion of a 3 x 3 determinant can be remembered by the 
following device. Write a second copy of the first two columns to 
the right of the matrix, and compute the determinant by multiply¬ 
ing entries on six diagonals: 


o n a |2 « 13 


A A 


a 2l a 22 a 23 


A .A 


a 3l a 32 a 33 


a U a l2 


a 2l a 22 


a 3\ a 32 


Use Exercises 25-28 to answer the questions in Exercises 31 
and 32. Give reasons for your answers. 

31. What is the determinant of an elementary row replacement 
matrix? 

32. What is the determinant of an elementary scaling matrix with 
k on the diagonal? 

In Exercises 33-36, verify that det EA = (det £’)(det A), where 

a b 


E is the elementary matrix shown and A = 


c 


d 


+ 


+ 


+ 


Add the downward diagonal products and subtract the up¬ 
ward products. Use this method to compute the determinants in 
Exercises 15-18. Warning: This trick does not generalize in any 
reasonable way to 4 x 4 or larger matrices. 


33. 


35. 


1 

0 

0 

1 


k 

1 

1 

0 


34. 


36. 


1 

k 

1 

0 


0 

1 

0 

k 



1 

0 

4 


0 

3 

1 

15. 

2 

3 

2 

16. 

4 

-5 

0 


0 

5 

-2 


3 

4 

1 


2 

-3 

3 


1 

3 

4 

17. 

3 

2 

2 

18. 

2 

3 

1 


1 

3 

-1 


3 

3 

2 


37. Let A ~ 


38. Let A ~ 


3 

4 

a 

c 


1 

2 

b 

d 


. Write 5A. Is det 5A = 5 det A? 


and let k be a scalar. Find a formula that 


In Exercises 19-24, explore the effect of an elementary row 
operation on the determinant of a matrix. In each case, state the 
row operation and describe how it affects the determinant. 


19 . 


a b 


c d 

c d 


a b 


relates det kA to k and det A . 

In Exercises 39 and 40, A is an n x n matrix. Mark each statement 
True or False. Justify each answer. 

39. a. An n x n determinant is defined by determinants of 

(n - 1) x (n — 1) submatrices. 

b. The (/, j) -cofactor of a matrix A is the matrix A t j ob¬ 
tained by deleting from A its zth row and j th column. 
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40. a. The cofactor expansion of det A down a column is equal 

to the cofactor expansion along a row. 

b. The determinant of a triangular matrix is the sum of the 
entries on the main diagonal. 


41. Let u = 



and v = 



. Compute the area of the par¬ 


allelogram determined by u, v, u + v, and 0, and compute 
the determinant of [ u v ]. How do they compare? Replace 
the first entry of v by an arbitrary number x, and repeat the 
problem. Draw a picture and explain what you find. 


42. Let u = 



and v = 



, where a , b , and c are positive 


(for simplicity). Compute the area of the parallelogram deter¬ 
mined by u, v, u + v, and 0, and compute the determinants of 
the matrices [ u v ] and [ v u ]. Draw a picture and explain 
what you find. 


43. [M] Construct a random 4x4 matrix A with integer en¬ 
tries between —9 and 9. How is detv4 -1 related to det A1 
Experiment with random n x n integer matrices for n = 4, 


5, and 6, and make a conjecture. Note: In the unlikely event 
that you encounter a matrix with a zero determinant, reduce 
it to echelon form and discuss what you find. 

44. [M] Is it true that det AB = (det A) (det B)1 To find out, 
generate random 5x5 matrices A and B , and compute 
det AB — (det 4 det??). Repeat the calculations for three 
other pairs of n x n matrices, for various values of n. Report 
your results. 

45. [M] Is it true that det(4 + B) = det A + det B ? Experiment 
with four pairs of random matrices as in Exercise 44, and 
make a conjecture. 

46. [M] Construct a random 4x4 matrix A with integer entries 
between —9 and 9, and compare det A with det A T , det(— A) , 
det (24), and det(104). Repeat with two other random 4x4 
integer matrices, and make conjectures about how these de¬ 
terminants are related. (Refer to Exercise 36 in Section 2.1.) 
Then check your conjectures with several random 5x5 and 
6 x 6 integer matrices. Modify your conjectures, if necessary, 
and report your results. 


SOLUTION TO PRACTICE PROBLEM 


Take advantage of the zeros. Begin with a cofactor expansion down the third column to 
obtain a 3 x 3 matrix, which may be evaluated by an expansion down its first column. 




The (—1) 2+1 in the next-to-last calculation came from the (2, Imposition of the —5 in 
the 3x3 determinant. 


3.2 PROPERTIES OF DETERMINANTS 


The secret of determinants lies in how they change when row operations are performed. 
The following theorem generalizes the results of Exercises 19-24 in Section 3.1. The 
proof is at the end of this section. 


THEOREM 3 Row Operations 

Let A be a square matrix. 

a. If a multiple of one row of A is added to another row to produce a matrix B , 
then det B = det A . 

b. If two rows of A are interchanged to produce B , then det B = — det A. 

c. If one row of A is multiplied by k to produce B , then det B —k • det A . 
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The following examples show how to use Theorem 3 to find determinants 
efficiently. 


EXAMPLE 1 Compute det A , where A = 



SOLUTION The strategy is to reduce A to echelon form and then to use the fact that the 
determinant of a triangular matrix is the product of the diagonal entries. The first two 
row replacements in column 1 do not change the determinant: 



1 

-4 

2 


1 

-4 

2 


1 

-4 

2 

del A = 

-2 

8 

-9 

— 

0 

0 

-5 

— 

0 

0 

-5 


-1 

7 

0 


-1 

7 

0 


0 

3 

2 


An interchange of rows 2 and 3 reverses the sign of the determinant, so 


det A 


1 

0 

0 




= —(1) (3) (—5) = 15 


A common use of Theorem 3(c) in hand calculations is to factor out a common 
multiple of one row of a matrix. For instance, 


* 

* 

* 


* 

* 

* 

5k 

—2k 

3 k 

= k 

5 

-2 

3 

* 

* 

* 


* 

* 

* 


where the starred entries are unchanged. We use this step in the next example. 


EXAMPLE 2 Compute det A , where A = 




SOLUTION To simplify the arithmetic, we want a 1 in the upper-left corner. We could 
interchange rows 1 and 4. Instead, we factor out 2 from the top row, and then proceed 
with row replacements in the first column: 


det A = 2 


1 

-4 

3 

4 


1 

-4 

3 

4 

3 

-9 

5 

10 

= 2 

0 

3 

-4 

-2 

-3 

0 

1 

-2 

0 

-12 

10 

10 

1 

-4 

0 

6 


0 

0 

-3 

2 


Next, we could factor out another 2 from row 3 or use the 3 in the second column as a 
pivot. We choose the latter operation, adding 4 times row 2 to row 3: 


det A 





4 

2 

2 

2 


Finally, adding —1/2 times row 3 to row 4, and computing the “triangular” determinant, 
we find that 

1-434 
0 3-4-2 

0 0-62 
0 0 0 1 


det A = 2 


2 • (1)(3)(—6)(1) = —36 
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0 ■ * 

0 0 ■ 

0 0 0 

det U * 0 


* 

* 

* 




0 ■ * 

0 0 0 

0 0 0 

det U = 0 



FIGURE 1 

Typical echelon forms of square 
matrices. 


Suppose a square matrix A has been reduced to an echelon form U by row replace¬ 
ments and row interchanges. (This is always possible. See the row reduction algorithm 
in Section 1.2.) If there are r interchanges, then Theorem 3 shows that 

det A = (— l) r det U 

Since U is in echelon form, it is triangular, and so det U is the product of the 
diagonal entries Uu,..., u nn . If A is invertible, the entries uu are all pivots (because 
A ~ I n and the uu have not been scaled to 1 ’s). Otherwise, at least u nn is zero, and the 
product U\ \ • • • u nn is zero. See Figure 1. Thus 


del A = 


(-i ) r 


V 



( product of 
pivots in U 


when A is invertible 
when A is not invertible 



It is interesting to note that although the echelon form U described above is not unique 
(because it is not completely row reduced), and the pivots are not unique, the product 
of the pivots is unique, except for a possible minus sign. 

Formula (1) not only gives a concrete interpretation of det A but also proves the 
main theorem of this section: 


THEOREM 4 


A square matrix A is invertible if and only if det A ^ 0. 


Theorem 4 adds the statement “det A ^ 0” to the Invertible Matrix Theorem. A 
useful corollary is that det A = 0 when the columns of A are linearly dependent. Also, 
det A = 0 when the rows of A are linearly dependent. (Rows of A are columns of A r , 
and linearly dependent columns of A T make A T singular. When A T is singular, so is A, 
by the Invertible Matrix Theorem.) In practice, linear dependence is obvious when two 
columns or two rows are the same or a column or a row is zero. 


EXAMPLE 3 Compute det A, where A = 



SOLUTION Add 2 times row 1 to row 3 to obtain 


det A 






because the second and third rows of the second matrix are equal. 


i— NUMERICAL NOTES - 

1. Most computer programs that compute det A for a general matrix A use the 
method of formula (1) above. 

2. It can be shown that evaluation of an n x n determinant using row operations 
requires about 2n 3 /3 arithmetic operations. Any modern microcomputer can 
calculate a 25 x 25 determinant in a fraction of a second, since only about 
10,000 operations are required. 


WEB 
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THEOREM 5 


Computers can also handle large “sparse” matrices, with special routines that take 
advantage of the presence of many zeros. Of course, zero entries can speed hand compu¬ 
tations , too. The calculations in the next example combine the power of row operations 
with the strategy from Section 3.1 of using zero entries in cofactor expansions. 


EXAMPLE 4 Compute det A, where A = 



SOLUTION A good way to begin is to use the 2 in column 1 as a pivot, eliminating 
the —2 below it. Then use a cofactor expansion to reduce the size of the determinant, 
followed by another row replacement operation. Thus 


det A 


0 

2 

0 

0 





1 

2 

-1 


1 

2 

-1 

2 

3 

6 

2 

= -2 

0 

0 

5 


0 

-3 

1 


0 

-3 

1 


An interchange of rows 2 and 3 would produce a “triangular determinant.” Another 
approach is to make a cofactor expansion down the first column: 


det A = (—2) (1) 




Column Operations 

We can perform operations on the columns of a matrix in a way that is analogous to the 
row operations we have considered. The next theorem shows that column operations 
have the same effects on determinants as row operations. 

Remark: The Principle of Mathematical Induction says the following: Let P(n ) be a 
statement that is either true or false for each natural number ft. Then P{n) is true for all 
ft > 1 provided that P(l) is true, and for each natural number k , if P(k ) is true, then 
P(k + 1) is true. The Principle of Mathematical Induction is used to prove the next 
theorem. 


If A is an ft x n matrix, then det A r = det A. 


PROOF The theorem is obvious for n — 1. Suppose the theorem is true for k x k 
determinants and let ft = k + 1. Then the cofactor of ay in A equals the cofactor 
of aj i in A r , because the cofactors involve k x k determinants. Hence the cofactor 
expansion of det A along the first row equals the cofactor expansion of det A T down the 
first column. That is, A and A T have equal determinants. The theorem is true for n — 1, 
and the truth of the theorem for one value of n implies its truth for the next value of n. 
By the Principle of Mathematical Induction, the theorem is true for all ft > 1. 

Because of Theorem 5, each statement in Theorem 3 is true when the word row is 
replaced everywhere by column. To verify this property, one merely applies the original 
Theorem 3 to A T . A row operation on A T amounts to a column operation on A. 

Column operations are useful for both theoretical purposes and hand computations. 
However, for simplicity we’ll perform only row operations in numerical calculations. 
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THEOREM 6 


Determinants and Matrix Products 

The proof of the following useful theorem is at the end of the section. Applications are 
in the exercises. 


Multiplicative Property 

If A and B are n x n matrices, then det AB = (det A) (det B). 


EXAMPLE 5 Verify Theorem 6 for A = 



and B 




SOLUTION 



25 20 

14 13 


det AB = 25 • 13 - 20 • 14 = 325 - 280 = 45 



Since det A = 9 and det B = 5, 


(det A) (det B) = 9 • 5 = 45 = det Ai? 


3 

2 


Warning: A common misconception is that Theorem 6 has an analogue for sums of 
matrices. However, det (A + B) is not equal to det A + det B , in general. 


A Linearity Property of the Determinant Function 


For an/ix/i matrix A , we can consider det A as a function of the n column vectors in 
A. We will show that if all columns except one are held fixed, then det A is a linear 
function of that one (vector) variable. 

Suppose that the j th column of A is allowed to vary, and write 





Define a transformation T from W 1 to M by 

T(x) = det[ai ••• SLj-i x a 7+ i 



] 



] 


Then, 


T (cx) = cT(x ) for all scalars c and all x in M 
T(u + v) = T(u) + T (v) for all u, v in R /? 


( 2 ) 

( 3 ) 


Property (2) is Theorem 3(c) applied to the columns of A. A proof of property (3) 
follows from a cofactor expansion of det A down the j th column. (See Exercise 43.) 
This (multi-) linearity property of the determinant turns out to have many useful conse¬ 
quences that are studied in more advanced courses. 


Proofs of Theorems 3 and 6 

It is convenient to prove Theorem 3 when it is stated in terms of the elementary matrices 
discussed in Section 2.2. We call an elementary matrix E a row replacement ( matrix ) if 
E is obtained from the identity I by adding a multiple of one row to another row; E is 
an interchange if E is obtained by interchanging two rows of I ; and E is a scale by r 
if E is obtained by multiplying a row of / by a nonzero scalar r. With this terminology, 
Theorem 3 can be reformulated as follows: 
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If A is an n x n matrix and E is an n x n elementary matrix, then 

&ziEA = (det if) (det A) 


where 



if E is a row replacement 
if E is an interchange 
if E is a scale by r 


PROOF OF THEOREM 3 The proof is by induction on the size of A. The case of a 
2x2 matrix was verified in Exercises 33-36 of Section 3.1. Suppose the theorem has 
been verified for determinants of k x k matrices with k > 2, let ft = k + 1, and let A 
be n x n. The action of E on A involves either two rows or only one row. So we can 
expand det EA across a row that is unchanged by the action of E , say, row i. Let 
Aij (respectively, Bjj) be the matrix obtained by deleting row i and column j from 
A (respectively, EA). Then the rows of Bjj are obtained from the rows of A / y - by the 
same type of elementary row operation that E performs on A. Since these submatrices 
are only k x k, the induction assumption implies that 

det Bjj = a • det A jj 

where a — 1, — 1, or r, depending on the nature of E . The cofactor expansion across 
row i is 

det£4 = a ;1 (-l) , + 1 det 5,1 H-f a in {- 1)‘ + " det5,„ 

= o'a,i(-l)' +1 det^n H-1- aa in (-l)' +n det d 

= a • det A 

In particular, taking A = I n , we see that det E = 1, — 1, or r, depending on the nature 
of E . Thus the theorem is true for n = 2, and the truth of the theorem for one value of n 
implies its truth for the next value of n . By the principle of induction, the theorem must 
be true for n > 2. The theorem is trivially true for ft = 1. ■ 


PROOF OF THEOREM 6 If A is not invertible, then neither is AB , by Exercise 27 
in Section 2.3. In this case, det AB = (det v4)(det B), because both sides are zero, by 
Theorem 4. If A is invertible, then A and the identity matrix I n are row equivalent by 
the Invertible Matrix Theorem. So there exist elementary matrices E \,..., E p such that 

A ~ E p E p -i • • • E\ • I n = EpEp-i • • • E\ 

For brevity, write | A \ for det A . Then repeated application of Theorem 3, as rephrased 
above, shows that 







PRACTICE PROBLEMS 


1-3 1-2 

2 -5 -1 -2 

0-451 
-3 10 -6 8 


1. Compute 


in as few steps as possible. 
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2. Use a determinant to decide if Vi, V 2 , and V 3 are linearly independent, when 



5 


1 

u> 

_1 


1— 

<N 

1_ 

Vi = 

-7 

, V 2 = 

3 

, V 3 = 

-7 


1 

os 

_1 


1 

Ul 

1_ 


5 


3. Let A be an n x n matrix such that A 2 = I . Show that det A = d= 1. 


3.2 EXERCISES 


Each equation in Exercises 1-4 illustrates a property of determi¬ 
nants. State the property. 



0 

5 

-2 



1 

-3 

6 

1 . 

1 

-3 

6 

— _ 


0 

5 

-2 
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-1 

8 



4 

-1 

8 


1 

2 

2 


1 

2 

2 

2. 

0 

3 

-4 

— 

0 

3 

-4 


3 

7 

4 


0 

1 

-2 


3 

-6 

9 



1 

-2 

3 

3. 

3 

5 

-5 

= 3 

3 

5 

-5 


1 

3 

3 



1 

3 

3 


1 

3 

-4 


1 

3 

-4 

4. 

2 

0 

-3 

— 

0 

-6 

5 


3 

-5 

2 


3 

-5 

2 


Find the determinants in Exercises 5-10 by row reduction to 
echelon form. 



1 

5 

-4 



3 

3 - 

-3 



5. 

-1 

-4 

5 


6. 

3 

4 - 

-4 




-2 

-8 

7 



2 

-3 - 

-5 




1 

3 

0 

2 


1 

3 


2 

-4 

7. 

-2 

-5 

7 

4 

8. 

0 

1 


2 

-5 

3 

5 

2 

1 

2 

7 


6 

-3 


1 

-1 

2 

-3 


-3 

-10 


■7 

2 


1 

-1 

-3 

0 







9. 

0 

1 

5 

4 







-1 

0 

5 

3 








3 

-3 

-2 

3 








1 

3 

-1 

0 

-2 







0 

2 

-4 

-2 

-6 






10. 

-2 

-6 

2 

3 

10 







1 

5 

-6 

2 

-3 







0 

2 

-4 

5 

9 








2 

5 

4 

1 


1 

5 

4 

1 

4 

7 

6 

2 

14. 

0 

-2 

-4 

0 

6 

-2 

-4 

0 

3 

5 

4 

1 

-6 

7 

7 

0 


-6 

5 

5 

0 


Find the determinants in Exercises 15-20, where 


15. 


17. 


19. 


20 . 


a b 
d e 
3 g 3 h 

a + d 

d 

g 

a 

2d + a 
g 


a 

d 


g 


c 

f 

3 i 


b + e 
e 

h 


h 


a b 

d + 3 g c + 3h 


g 


h 


b 


h 


c 

f 

1 


= 7, 


16. 


c + f 
f 


1 


18. 


b c 

2c b 2 f c 


1 


c 

f + 3/ 

1 


a 

5d 

g 


d 

a 

g 


b 

5e 

h 


e 

b 

h 




c 

1 


In Exercises 21-23, use determinants to find out if the matrix is 
invertible. 



<N 

1_ 

6 

1 

0 



"5 

1 

1 

21. 

1 

3 

2 


22. 

1 

-3 

-2 


_ 3 

9 

2 _ 



0 

5 

3 _ 

23. 

"2 

1 

0 

-7 

0 

-5 

6" 

0 





3 

8 

6 

0 






0 

7 

5 

4 






In Exercises 24-26, use determinants to decide if the set of vectors 
is linearly independent. 


Combine the methods of row reduction and cofactor expansion to 
compute the determinants in Exercises 11-14. 


24 


4 

6 

2 



~-7~ 


1 

_ 1 

5 

0 

5 

-5 


7 


1 

<N 

_1 


3 

4 

-3 

-1 


-1 

2 

3 

0 







3 

0 

1 

-3 

12. 

3 

4 

3 

0 


7" 


~-8~ 


7" 

-6 

0 

-4 

3 

11 

4 

6 

6 

25. 

-4 


5 

9 

0 

6 

8 

-4 

-1 


4 

2 

4 

3 


-6 


7 


-5 


11 . 
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In Exercises 27 and 28, A and B are n x n matrices. Mark each 
statement True or False. Justify each answer. 


27. a. A row replacement operation does not affect the determi¬ 
nant of a matrix. 

b. The determinant of A is the product of the pivots in any 
echelon form U of A, multiplied by (—l)', where r is the 
number of row interchanges made during row reduction 
from A to U . 


c. If the columns of A are linearly dependent, then 
det A = 0. 

d. det(A + B) = det A + det B . 

28. a. If three row interchanges are made in succession, then the 

new determinant equals the old determinant. 

b. The determinant of A is the product of the diagonal entries 
in A. 


c. If det A is zero, then two rows or two columns are the 
same, or a row or a column is zero. 

d. det A -1 = (—l)detA. 


29. Compute det B 4 , where B = 




30. Use Theorem 3 (but not Theorem 4) to show that if two rows 
of a square matrix A are equal, then det A = 0. The same is 
true for two columns. Why? 


In Exercises 31-36, mention an appropriate theorem in your 

explanation. 

, 1 

31. Show that if A is invertible, then det A = -. 

det A 

32. Suppose that A is a square matrix such that det A 3 = 0. 
Explain why A cannot be invertible. 

33. Let A and B be square matrices. Show that even though 
AB and BA may not be equal, it is always true that 
det AB = det BA. 

34. Let A and P be square matrices, with P invertible. Show that 
det (PAP~ l ) = det A. 

35. Let U be a square matrix such that U T U — I . Show that 
det U = ±1. 

36. Find a formula for det(rA) when A is an n x n matrix. 






39. Let A and B be 3x3 matrices, with det A = —3 and 
det B = 4. Use properties of determinants (in the text and 
in the exercises above) to compute: 

a. det AB b. det5A c. det B T 

d. det A -1 e. det A 3 


40. Let A and B be 4x4 matrices, with det A = — 3 and 
det B = — 1. Compute: 

a. det AB b. det# 5 c. det2A 

d. det A T BA e. det B~ l AB 


41. Verify that det A = det B + det C, where 




a + e 
c 




b + f 

d 

0 " 

1 



and 





Show 


det(A + B) = det A + det B if and only if a + d = 0. 



that 


43. Verify that det A = det B + det C, where 



a ii 

an 

U\ + 





A = 

a 2 \ 

a 22 

u 2 + r>2 

9 





_a 3 1 

a 3 2 

u 3 + ^3 _ 






an 

an 

U\ 



an 

an 

Vl 

B = 

a 2 \ 

a 22 

u 2 

, c 


a 21 

a 2 2 

V2 


_a 3 \ 

a 32 

u 3 _ 



_a 3 \ 

a 3 2 

v 3 _ 


Note, however, that A is not the same as B + C . 


44. Right-multiplication by an elementary matrix E affects the 
columns of A in the same way that left-multiplication affects 
the rows. Use Theorems 5 and 3 and the obvious fact that E T 
is another elementary matrix to show that 

det AE = (det#) (det A) 

Do not use Theorem 6. 


45. [M] Compute det A T A and det A A T for several random 
4x5 matrices and several random 5x6 matrices. What can 
you say about A T A and AA T when A has more columns than 
rows? 


46. [M] If det A is close to zero, is the matrix A nearly singular? 
Experiment with the nearly singular 4x4 matrix 


4 0-7-7 

-6 1 11 9 

7 -5 10 19 

-123 -1 


Verify that det AB = (det A)(det B) for the matrices in Exercises 
37 and 38. (Do not use Theorem 6.) 


37. A = 




Compute the determinants of A, 10 A, and 0.1 A. In contrast, 
compute the condition numbers of these matrices. Repeat 
these calculations when A is the 4x4 identity matrix. Dis¬ 
cuss your results. 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. Perform row replacements to create zeros in the first column, and then create a row 
of zeros. 


2 . det[vi 


1 

-3 

1 

-2 


1 

-3 

1 

-2 


1 

-3 

1 -2 

2 

-5 

-1 

-2 


0 

1 

-3 

2 


0 

1 

-3 2 

0 

-4 

5 

1 


0 

-4 

5 

1 


0 

-4 

5 1 

-3 

10 

-6 

8 


0 

1 

-3 

2 


0 

0 

0 0 





5 - 

-3 

2 


5 

-3 

2 





v 3 ] = 



-7 

3 

-7 


-2 

0 

-5 


Row 1 added 

v 2 



— 


to row 2 





9 - 

-5 

5 


9 

-5 

5 





0 


(- 3 ) 


2 -5 
9 5 


(- 5 ) 


5 2 
2 —5 


Cofactors of 
column 2 


3 • (35) + 5 • (-21) = 0 


By Theorem 4, the matrix [ Vi v 2 V 3 ] is not invertible. The columns are linearly 
dependent, by the Invertible Matrix Theorem. 

3. Recall that det I = 1. By Theorem 6 , det (A A) = (det v4)(det A). Putting these two 


observations together results in 

1 = det / = detv4 2 


det(v4v4) = (det v4)(det A) = (detv4) 


Taking the square root of both sides establishes that det A = =b 1. 


3.3 CRAMER'S RULE, VOLUME, AND LINEAR TRANSFORMATIONS 


This section applies the theory of the preceding sections to obtain important theoretical 
formulas and a geometric interpretation of the determinant. 


Cramer’s Rule 


Cramer’s rule is needed in a variety of theoretical calculations. For instance, it can be 
used to study how the solution of v4x = b is affected by changes in the entries of b. 
However, the formula is inefficient for hand calculations, except for 2 x 2 or perhaps 
3x3 matrices. 

For any nxn matrix A and any b in M 7? , let Ai (b) be the matrix obtained from A 
by replacing column i by the vector b. 


Ai (b) = [ai 


b 

t 

col i 


&n\ 


THEOREM 7 


Cramer's Rule 


Let A be an invertible n x n matrix. For any b in the unique solution x of 
Ax = b has entries given by 


_ det A ,(b) 
det A 
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PROOF Denote the columns of A by ai,..., a n and the columns of the n x n identity 
matrix I by ei,..., e„. If Ax = b, the definition of matrix multiplication shows that 

A • If (x) = A [ ei • • • x • • • e n ] = [ Ae\ • • • Ax • • • v4e /7 ] 

= ai ••• b ••• a„ ] = Aj (b) 

By the multiplicative property of determinants, 

(det A) (det f (x)) = det A t (b) 

The second determinant on the left is simply x z . (Make a cofactor expansion along the 
/th row.) Hence (det A) • x z = detv4 z (b). This proves (1) because A is invertible and 
det A ^ 0. ■ 


EXAM PLE 1 Use Cramer’s rule to solve the system 


3x\ — 2x2 — 6 
— 5x\ + 4x2 = 8 

SOLUTION View the system as Ax = b. Using the notation introduced above, 


A 


Since det A 


3 

-5 

-2" 

4 

9 

A i(b) = 

"6 

8 

-2" 

4 

n 

A 2 ( b) = 

——— 

2, the system has a unique solution. By 

Cramer’s rule, 



Xi = 

det A 1 (b) 

24+ 16 

= 20 




det A 


2 




x 2 = 

_ det ,4 2 (b) _ 

24 + 30 

= 27 




det A 


2 



3 

5 


6 

8 


Application to Engineering 

A number of important engineering problems, particularly in electrical engineering and 
control theory, can be analyzed by Laplace transforms. This approach converts an ap¬ 
propriate system of linear differential equations into a system of linear algebraic equa¬ 
tions whose coefficients involve a parameter. The next example illustrates the type of 
algebraic system that may arise. 


EXAMPLE 2 Consider the following system in which s is an unspecified parameter. 
Determine the values of s for which the system has a unique solution, and use Cramer’s 
rule to describe the solution. 

3sxi — 2x2 = 4 

— 6Xi + SX2 = 1 


SOLUTION View the system as Ax = b. Then 




Since 

detA = 3 s 2 — 12 = 3(s + 2)(s — 2) 



the system has a unique solution precisely when s ±2. For such an s, the solution is 
(xi, X 2 ), where 



det A 1 (b) 
det A 


4s + 2 

3 (s + 2) (s - 2) 


detv42(b) 3s + 24 

*2 = 


s + 8 


det A 


3 (s + 2) (s - 2) (s + 2) (s - 2) 
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THEOREM 8 


A Formula for A -1 

Cramer’s rule leads easily to a general formula for the inverse of an n x n matrix A . The 
j th column of A~ l is a vector x that satisfies 



where e 7 is the j th column of the identity matrix, and the i th entry of x is the (/, j )-entry 
of A~ 1 . By Cramer’s rule, 

, i. detv4/(e 7 ) 

[(/, /)-entry of A *} = x t = ^ - — (2) 

Recall that Ajj denotes the submatrix of A formed by deleting row j and column i . A 
cofactor expansion down column i of Af (e 7 ) shows that 

det A (e 7 ) = (-1) ,+ ^ det d 7 , = C ;; - (3) 

where Cy, is a cofactor of A. By (2), the , / (-entry of A~ l is the cofactor Cj, divided 
by det A. [Note that the subscripts on C y7 are the reverse of (/, j ).] Thus 




The matrix of cofactors on the right side of (4) is called the adjugate (or classical 
adjoint) of A , denoted by adj A. (The term adjoint also has another meaning in advanced 
texts on linear transformations.) The next theorem simply restates (4). 


An Inverse Formula 

Let A be an invertible n x n matrix. Then 

A~ l = —— adj A 

det A 


EXAMPLE 3 


Find the inverse of the matrix A = 



SOLUTION The nine cofactors are 



The adjugate matrix is the transpose of the matrix of cofactors. [For instance, Cu goes 
in the (2,1) position.] Thus 



14 

-7 

-7 































182 CHAPTER 3 


Determinants 


We could compute det A directly, but the following computation provides a check on 
the calculations on page 181 and produces det A : 



"-2 

14 

4" 


"2 

1 

3 


"14 

0 

0 

(adj A) ■ A = 

3 

-7 

1 


1 

-1 

1 

— 

0 

14 

0 


5 

-7 

-3 


1 

4 

-2 


0 

0 

14 



Since (adj A) A = 14/, Theorem 8 shows that det A = 14 and 


1 

14 

"-2 

14 

4" 


r-i/7 

1 

2/7 


3 

-7 

1 

— 

3/14 

-1/2 

1/14 

■ 

5 

-7 

-3 


5/14 

-1/2 

-3/14 



i— NUMERICAL NOTES - 

Theorem 8 is useful mainly for theoretical calculations. The formula for A~ l 
permits one to deduce properties of the inverse without actually calculating it. 
Except for special cases, the algorithm in Section 2.2 gives a much better way to 
compute A ~ l , if the inverse is really needed. 

Cramer’s rule is also a theoretical tool. It can be used to study how sensitive 
the solution of Ax = b is to changes in an entry in b or in A (perhaps due 
to experimental error when acquiring the entries for b or A). When A is a 
3x3 matrix with complex entries, Cramer’s rule is sometimes selected for hand 
computation because row reduction of [ A b ] with complex arithmetic can be 
messy, and the determinants are fairly easy to compute. For a larger n x n matrix 
(real or complex), Cramer’s rule is hopelessly inefficient. Computing just one 
determinant takes about as much work as solving Ax = b by row reduction. 


Determinants as Area or Volume 

In the next application, we verify the geometric interpretation of determinants described 
in the chapter introduction. Although a general discussion of length and distance in W 1 
will not be given until Chapter 6, we assume here that the usual Euclidean concepts of 
length, area, and volume are already understood for M 2 and M 3 . 


THEOREM 9 


If A is a 2 x 2 matrix, the area of the parallelogram determined by the columns of 
A is |det v4|. If A is a 3 x 3 matrix, the volume of the parallelepiped determined 
by the columns of A is |det A |. 



A Geometric Proof 
3-12 


o] ■ 

d 



PROOF The theorem is obviously true for any 2x2 diagonal matrix 


det 


a 

0 


0 

d 


\ad\ 


{ area of 
rectangle 


See Figure 1. It will suffice to show that any 2x2 matrix A = [ aj a 2 ] can be trans¬ 
formed into a diagonal matrix in a way that changes neither the area of the associated 
parallelogram nor |detv4|. From Section 3.2, we know that the absolute value of the 
determinant is unchanged when two columns are interchanged or a multiple of one 
column is added to another. And it is easy to see that such operations suffice to transform 
A into a diagonal matrix. Column interchanges do not change the parallelogram at all. 
So it suffices to prove the following simple geometric observation that applies to vectors 
in M 2 or M 3 : 


FIGURE 1 

Area = \ad\. 
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Let ai and a 2 be nonzero vectors. Then for any scalar c, the area of the parallelo¬ 
gram determined by ai and a 2 equals the area of the parallelogram determined by 
ai and a 2 + c ai. 


To prove this statement, we may assume that a 2 is not a multiple of ai, for other¬ 
wise the two parallelograms would be degenerate and have zero area. If L is the line 
through 0 and ai, then a 2 + L is the line through a 2 parallel to L, and a 2 + c ai is on 
this line. See Figure 2. The points a 2 and a 2 + c ai have the same perpendicular distance 
to L. Hence the two parallelograms in Figure 2 have the same area, since they share the 
base from 0 to ai. This completes the proof for R 2 . 




+ L 


FIGURE 2 Two parallelograms of equal area. 


0 

0 

c 




FIGURE 3 

Volume = \abc\. 


The proof for R 3 is similar. The theorem is obviously true for a 3 x 3 diagonal 
matrix. See Figure 3. And any 3x3 matrix A can be transformed into a diagonal matrix 
using column operations that do not change | det A \ . (Think about doing row operations 
on A t .) So it suffices to show that these operations do not affect the volume of the 
parallelepiped determined by the columns of A . 

A parallelepiped is shown in Figure 4 as a shaded box with two sloping sides. 
Its volume is the area of the base in the plane Span{ai,a 3 } times the altitude of a 2 
above Spanjai, a 3 }. Any vector a 2 + c ai has the same altitude because a 2 + csl\ lies 
in the plane a 2 + Span {ai, a 3 }, which is parallel to Span {ai, a 3 }. Hence the volume of 
the parallelepiped is unchanged when [ ai a 2 a 3 ] is changed to [ ai a 2 + cai a 3 ]. 
Thus a column replacement operation does not affect the volume of the parallelepiped. 
Since column interchanges have no effect on the volume, the proof is complete. ■ 


a 0 + Span{a p a 3 } 



Span{a p a 3 } 



FIGURE 4 Two parallelepipeds of equal volume. 


EXAMPLE 4 Calculate the area of the parallelogram determined by the points 
(-2, -2), (0, 3), (4, -1), and (6, 4). See Figure 5(a). 

SOLUTION First translate the parallelogram to one having the origin as a vertex. For 
example, subtract the vertex (—2, —2) from each of the four vertices. The new par¬ 
allelogram has the same area, and its vertices are (0,0), (2, 5), (6,1), and (8, 6). See 
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(a) (b) 

FIGURE 5 Translating a parallelogram does not change its 
area. 


Figure 5(b). This parallelogram is determined by the columns of 



Since |det A| 


281, the area of the parallelogram is 28. 


Linear Transformations 

Determinants can be used to describe an important geometric property of linear trans¬ 
formations in the plane and in M 3 . If T is a linear transformation and S is a set in the 
domain of T, let T(S) denote the set of images of points in S . We are interested in how 
the area (or volume) of T(S) compares with the area (or volume) of the original set S . 
For convenience, when S is a region bounded by a parallelogram, we also refer to S as 
a parallelogram. 


Let T : M 2 M 2 be the linear transformation determined by a 2 x 2 matrix A . If 
S is a parallelogram in R 2 , then 

{area of T(S)} = |det A\ • {area of S} (5) 

If T is determined by a 3 x 3 matrix A, and if S is a parallelepiped in R 3 , then 

{volume of T(S)} = |det A| • {volume of S} (6) 


PROOF Consider the 2x2 case, with A = [ ai a 2 ]. A parallelogram at the origin in 
R 2 determined by vectors bi and b 2 has the form 

S = {.Sibi + S2^2 • 0 < S\ < 1, 0 < ^2 5 1} 

The image of S under T consists of points of the form 

T(s\bi + s 2 b 2 ) = s\T(b\) + ^ 2 ^(b 2 ) 

= siAbi + s 2 Ab 2 

where 0 < s\ < 1, 0 < S 2 < 1. It follows that T(S ) is the parallelogram determined 
by the columns of the matrix [ Ab\ Ab 2 ]. This matrix can be written as AB , where 
B = [ bi b 2 ]. By Theorem 9 and the product theorem for determinants, 

{area of T(S)} = |detv4i?| = |detv4| • |deti? 

= |det A \ • {area of S} 


( 7 ) 
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An arbitrary parallelogram has the form p + S , where p is a vector and S is a parallelo¬ 
gram at the origin, as above. It is easy to see that T transforms p + S into 7"(p) + T(S ). 
(See Exercise 26.) Since translation does not affect the area of a set, 

{area of 7(p + S )} = {area of 7(p) + T (S')} 

= {area of T(S)} Translation 

= | det A | • {area of S } By equation (7) 

= | det A | • {area of p + S } Translation 

This shows that (5) holds for all parallelograms in M 2 . The proof of (6) for the 3x3 
case is analogous. ■ 

When we attempt to generalize Theorem 10 to a region in M 2 or M 3 that is not 
bounded by straight lines or planes, we must face the problem of how to define and 
compute its area or volume. This is a question studied in calculus, and we shall only 
outline the basic idea for M 2 . If R is a planar region that has a finite area, then R can 
be approximated by a grid of small squares that lie inside R. By making the squares 
sufficiently small, the area of R may be approximated as closely as desired by the sum 
of the areas of the small squares. See Figure 6. 



FIGURE 6 Approximating a planar region by a union of squares. 
The approximation improves as the grid becomes finer. 


If T is a linear transformation associated with a 2 x 2 matrix A , then the image of 
a planar region R under T is approximated by the images of the small squares inside 
R. The proof of Theorem 10 shows that each such image is a parallelogram whose area 
is |det A | times the area of the square. If R' is the union of the squares inside R , then 
the area of T ( R') is |det A | times the area of R'. See Figure 7. Also, the area of T(R') 
is close to the area of T ( R ). An argument involving a limiting process may be given to 
justify the following generalization of Theorem 10. 




FIGURE 7 Approximating T(R) by a union of parallelograms. 
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The conclusions of Theorem 10 hold whenever S is a region in M 2 with finite area 
or a region in M. 3 with finite volume. 



EXAMPLE 5 Let a and b be positive numbers. Find the area of the region E bounded 
by the ellipse whose equation is 


x 


l 


+ 


x 


a 


b 2 


1 


SOLUTION We claim that E is the image of the unit disk D under the linear transfor¬ 


mation T determined by the matrix A 


and x = v4u, then 


a 0 

0 b 


, because if u 


U\ 

u 2 


,X 


X\ 

x 2 


U 1 


X\ 


a 


and u 2 


x 2 

b 


It follows that u is in the unit disk, with u\ + u\< 1 , if and only if x is in E, with 
(xi /a) 2 + ( x 2 /b ) 2 < 1. By the generalization of Theorem 10, 


{area of ellipse} = {area of T(D)} 

= |detv4| • {area of D} 

= ab • 7r(l) = nab ■ 


PRACTICE PROBLEM 


Let S be the parallelogram determined by the vectors b 


l 


1 

3 


and b 2 


5 

1 


, and 


let A 


1 -.1 
0 2 


. Compute the area of the image of S under the mapping x i-> Ax 


3.3 EXERCISES 


Use Cramer’s rule to compute the solutions of the systems in 
Exercises 1-6. 

1. 5xi + 7x 2 — 3 2. 4xi x 2 — 6 

2xi + 4x2 = 1 3xi T 2x2 = 7 


9. sx\ + 2sx 2 = —1 10. sx\ — 2 x 2 = 1 

3xi + 6 sx 2 = 4 4^X1 + 4^x 2 = 2 

In Exercises 11-16, compute the adjugate of the given matrix, and 
then use Theorem 8 to give the inverse of the matrix. 


3. 3xi — 2x 2 = 3 

— 4xi + 6x 2 = —5 


4. — 5xi + 2 x 2 = 9 

3xi — x 2 = —4 



Xi + X2 



6. X\ T 3x2 T X 3 — 4 



+ 2x3 — 0 


—X\ + 




3xi T X 2 — 2 


In Exercises 7-10, determine the values of the parameter s 
for which the system has a unique solution, and describe the 
solution. 

7. 6 ^X 1 + 4x 2 =5 8. 3sxi + 5x 2 = 3 

9xi + 2sx 2 — —2 12xi + 5sx 2 = 2 



17. Show that if A is 2 x 2, then Theorem 8 gives the same 
formula for A~ l as that given by Theorem 4 in Section 2.2. 


18. Suppose that all the entries in A are integers and det A = 1 . 
Explain why all the entries in A~ l are integers. 
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In Exercises 19-22, find the area of the parallelogram whose 
vertices are listed. 

19. (0,0), (5,2), (6,4), (11,6) 

20. (0,0), (-2, 4), (4,-5), (2,-1) 

21. (-2,0), (0,3), (1,3), (-1,0) 

22. (0,-2), (5,-2), (-3,1), (2,1) 


23 


24 


25 


26 


29 


30 


Find the volume of the parallelepiped with one vertex at 
the origin and adjacent vertices at (1,0,—3), (1,2,4), and 
(5,1,0). 

Find the volume of the parallelepiped with one vertex at 
the origin and adjacent vertices at (1,3,0), (—2,0,2), and 
(—1, 3, —1). 

Use the concept of volume to explain why the determinant of 
a 3 x 3 matrix A is zero if and only if A is not invertible. Do 
not appeal to Theorem 4 in Section 3.2. [Hint: Think about 
the columns of A .] 


Let T : IR m -> be a linear transformation, and let p be a 
vector and S a set in IR m . Show that the image of p + S under 
T is the translated set T( p) + T(S) in 



n 


27. Let S be the parallelogram determined by the vectors 


bi = 


2 

3 


and b 2 = 


2 

5 


, and let A = 


6 

3 


-3 

2 


Compute the area of the image of S under the mapping 
x i-> Ax. 


28. Repeat Exercise 27 with bi = 


4 

7 


b? = 


0 

1 


, and 


A = 


5 

1 


2 

1 


Find a formula for the area of the triangle whose vertices are 
0 , Vi, and v 2 in 713)2 



Let R be the triangle with vertices at (xi, jq), (x 2 , y 2 ), and 
(^ 3 ,^ 3 ). Show that 


, , 1 
{area of triangle} = - det 


Xi 

yi 

*2 

12 

*3 

13 


1 

1 

1 


by the matrix A = 


a 

0 

0 


0 

b 

0 


0 

0 

c 


, where a, b, and c are 


32. 


33. 


34. 


[Hint: Translate R to the origin by subtracting one of the 
vertices, and use Exercise 29.] 

31. Let T : M . 3 -> R 3 be the linear transformation determined 35. 


positive numbers. Let S be the unit ball, whose bounding 
surface has the equation x 2 + x\ + xf — 1 . 

a. Show that T(S) is bounded by the ellipsoid with the 


equation 


x 


1 


x, 

+ + 


X 


= 1. 


a* b 2 c* 

b. Use the fact that the volume of the unit ball is 47 r /3 
to determine the volume of the region bounded by the 
ellipsoid in part (a). 


Let S be the tetrahedron in R 3 with vertices at the vectors 0, 
ei, e 2 , and e 3 , and let S' be the tetrahedron with vertices at 
vectors 0 , Vi , v 2 , and v 3 . See the figure. 


x. 


x. 




a. Describe a linear transformation that maps S onto S'. 

b. Find a formula for the volume of the tetrahedron S' using 
the fact that 

{volume of S} = (1/3) • {area of base} • {height} 

[M] Test the inverse formula of Theorem 8 for a random 
4x4 matrix A . Use your matrix program to compute the 
cofactors of the 3 x 3 submatrices, construct the adjugate, 
and set B = (adj ,4)/(det A). Then compute B — inv(A), 
where inv(A) is the inverse of A as computed by the matrix 
program. Use floating point arithmetic with the maximum 
possible number of decimal places. Report your results. 

[M] Test Cramer’s rule for a random 4x4 matrix A and a 
random 4x1 vector b. Compute each entry in the solution of 
Ax = b, and compare these entries with the entries in A~ l b. 
Write the command (or keystrokes) for your matrix program 
that uses Cramer’s rule to produce the second entry of x. 

[M] If your version of MATLAB has the flops command, 
use it to count the number of floating point operations to com¬ 
pute A~ 1 for a random 30 x 30 matrix. Compare this number 
with the number of flops needed to form (adj A )/ (det A ). 


SOLUTION TO PRACTICE PROBLEM 


The area of S is 


det 


1 

3 


5 

1 


14, and det A = 2. By Theorem 10, the area of the 


image of S under the mapping x i-> Ax is 


|det A\ • {area of S} = 2 • 14 = 28 
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CHAPTER 3 SUPPLEMENTARY EXERCISES 


1. Mark each statement True or False. Justify each answer. 

Assume that all matrices here are square. 

a. If A is a 2 x 2 matrix with a zero determinant, then one 
column of A is a multiple of the other. 

b. If two rows of a 3x3 matrix A are the same, then 
det A = 0. 

c. If A is a 3 x 3 matrix, then det 5A = 5 det A. 

d. If A and B are n x n matrices, with det A = 2 and 
det B = 3, then det(A + B) = 5. 

e. If A is n x n and det A = 2, then det A 3 = 6. 

f. If B is produced by interchanging two rows of A , then 
det B = det A. 

g. If B is produced by multiplying row 3 of A by 5, then 
det B = 5 • det A . 

h. If B is formed by adding to one row of A a linear 
combination of the other rows, then det B = det A. 

i. det A T = — det A. 

j. det(— A) = — det A. 

k. det A T A > 0. 

l. Any system of n linear equations in n variables can be 
solved by Cramer’s rule. 

m. If u and v are in M 2 and det [ u v ] = 10, then the area 
of the triangle in the plane with vertices at 0, u, and v 
is 10. 

n. If A 3 = 0, then det A = 0. 

o. If A is invertible, then det A -1 = det A. 

p. If A is invertible, then (det A)(det A -1 ) = 1. 


Use row operations to show that the determinants in Exercises 2-4 
are all zero. 



12 

13 

14 


1 

a 

b + c 

2. 

15 

16 

17 


3. 

1 

b 

a + c 


18 

19 

20 


1 

c 

a + b 


a 


b 

c 




4. 

a + x 


b + x 

c + X 





a + y 


b + y 

c + y 





Compute the determinants in Exercises 5 and 6. 



9 19 9 9 
9 0 9 9 2 
4 0 0 5 0 
9 0 3 9 0 
6 0 0 7 0 


4 8 8 8 5 
0 10 0 0 
6 8 8 8 7 
0 8 8 3 0 
0 8 2 0 0 


7. Show that the equation of the line in R 2 through distinct 
points (x\, ji) and (x 2 , y 2 ) can be written as 



1 

1 

1 



8. Find a 3 x 3 determinant equation similar to that in Exercise 7 
that describes the equation of the line through (xi, y{) with 
slope m . 


Exercises 9 and 10 concern determinants of the following Vander¬ 
monde matrices. 







_ 1 

t 

r 2 

t 2 


" 1 

a 

a 2 

. V(t) = 

1 

Xl 

* 2 

9 

X 

T = 

1 

b 

b 2 

1 



_1 

c 

c 2 . 


x 2 

x 2 

a 

X 






1 

*3 

x 2 

X 


9. Use row operations to show that 
det T = (b — a)(c — a)(c — b) 

10. Let f{t) = detU, with x\, x 2 , and x 3 all distinct. Explain 
why f(t) is a cubic polynomial, show that the coefficient of 
t 3 is nonzero, and find three points on the graph of /. 

11. Find the area of the parallelogram determined by the points 
(1,4), (—1,5), (3, 9), and (5, 8). How can you tell that the 
quadrilateral determined by the points is actually a parallel¬ 
ogram? 

12. Use the concept of area of a parallelogram to write a state¬ 
ment about a 2 x 2 matrix A that is true if and only if A is 
invertible. 


13. Show that if A is invertible, then adj A is invertible, and 


14. 


(adj A) 1 = 


1 


det A 


A 


[Hint: Given matrices B and C, what calculation(s) would 
show that C is the inverse of Bl] 

Let A, B , C ,D , and I be n x n matrices. Use the definition or 
properties of a determinant to justify the following formulas. 
Part (c) is useful in applications of eigenvalues (Chapter 5). 


a. det 


c. det 


A 

0 

A 

C 


0" 

I 

0 

D 


= det A b. det 


I 

C 


= (det A) (det D) = det 


0 

D 

A 

0 


= det D 


B 

D 


15. Let A, B , C , and D be n x n matrices with A invertible. 

a. Find matrices X and Y to produce the block LU factor¬ 
ization 


" A 

B~ 


" I 

0 " 


"A 

B~ 

C 

D 


X 

I 


0 

Y 


and then show that 


det 


A 

C 


B 

D 


= (det A) • det(Z) — CA~ l B) 


6 . 
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b. Show that if AC = CA, then 


18 


det 


A 

C 


B 

D 


= det(AL> - CB) 


16. Let J be the n x n matrix of all l’s, and consider 
A — {a — b)I + bJ; that is, 


A = 


a 

b 

b 


b 

a 

b 


b 

b 

a 


• • 


• • 


b 

b 

b 


b b b 


• • 


a 


Confirm that det A = (a — b) n 1 [a + (n — 1 )b\ as follows: 

a. Subtract row 2 from row 1, row 3 from row 2, and so on, 
and explain why this does not change the determinant of 
the matrix. 

b. With the resulting matrix from part (a), add column 1 to 
column 2, then add this new column 2 to column 3, and so 
on, and explain why this does not change the determinant. 

c. Find the determinant of the resulting matrix from (b). 

17. Let A be the original matrix given in Exercise 16, and let 


B = 


a — b 
0 
0 


b 

a 

b 


b 

b 

a 


• • 


b 

b 

b 


0 


C = 


b 

b 

b 


b 

a 

b 


b b 

b 

b 

a 


a 


• • • 


b 

b 

b 


b b b 


a 


Notice that A, B, and C are nearly the same except that 
the first column of A equals the sum of the first columns of 
B and C. A linearity property of the determinant function, 
discussed in Section 3.2, says that det A = det B + det C. 
Use this fact to prove the formula in Exercise 16 by induction 
on the size of matrix A . 


19 


[M] Apply the result of Exercise 16 to find the determinants 
of the following matrices, and confirm your answers using a 
matrix program. 


3 

8 

8 

8 


8 8 
3 8 

8 3 

8 8 


8 

8 

8 

3 


8 

3 

3 

3 

3 


3 

8 

3 

3 

3 


3 3 
3 3 
8 3 
3 8 
3 3 


3 

3 

3 

3 

8 


[M] Use a matrix program to compute the determinants of 
the following matrices. 


1 

1 

1 


1 

2 

2 


1 

2 

3 


— 

1_ 

1 

1 

1 " 


1 

2 

2 

2 


1 

2 

3 

3 

— 

1 

2 

3 

1 


1 

1 

1 

1 

1 


1 

2 

2 

2 

2 


1 

2 

3 

3 

3 


1 1 
2 2 

3 3 

4 4 

4 5 


Use the results to guess the determinant of the matrix below, 
and confirm your guess by using row operations to evaluate 
that determinant. 


1 

1 

1 


1 

2 

2 


1 

2 

3 


1 

2 

3 


1 


3 


n 


20. [M] Use the method of Exercise 19 to guess the determinant 
of 


1 

1 

1 


1 

3 

3 


1 

3 

6 


1 

3 

6 


1 3 


6 


3 {n — 1) _ 


Justify your conjecture. [Hint: Use Exercise 14(c) and the 
result of Exercise 19.] 

























Vector Spaces 




INTRODUCTORY EXAMPLE 

Space Flight and Control Systems 

Twelve stories high and weighing 75 tons, Columbia rose 
majestically off the launching pad on a cool Palm Sunday 
morning in April 1981. A product of ten years’ intensive 
research and development, the first U.S. space shuttle was a 
triumph of control systems engineering design, involving 
many branches of engineering —aeronautical, chemical, 
electrical, hydraulic, and mechanical. 

The space shuttle’s control systems are absolutely 
critical for flight. Because the shuttle is an unstable 
airframe, it requires constant computer monitoring during 
atmospheric flight. The flight control system sends a stream 
of commands to aerodynamic control surfaces and 44 small 
thruster jets. Figure 1 shows a typical closed-loop feedback 
system that controls the pitch of the shuttle during flight. 



(The pitch is the elevation angle of the nose cone.) The 
junction symbols (0) show where signals from various 
sensors are added to the computer signals flowing along 
the top of the figure. 

Mathematically, the input and output signals to an 
engineering system are functions. It is important in 
applications that these functions can be added, as in 
Figure 1, and multiplied by scalars. These two operations 
on functions have algebraic properties that are completely 
analogous to the operations of adding vectors in M 77 
and multiplying a vector by a scalar, as we shall see 
in Sections 4.1 and 4.8. For this reason, the set of all 
possible inputs (functions) is called a vector space. The 
mathematical foundation for systems engineering rests 


Commanded 

pitch 


Commanded 

pitch 


Commanded 

pitch 



Pitch 


FIGURE 1 Pitch control system for the space shuttle. ( Source: Adapted from Space Shuttle GN&C Operations 
Manual , Rockwell International, ©1988.) 
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on vector spaces of functions, and Chapter 4 extends the 
theory of vectors in M /? to include such functions. Later on, 


you will see how other vector spaces arise in engineering, 
physics, and statistics. 



The mathematical seeds planted in Chapters 1 and 2 germinate and begin to blossom 
in this chapter. The beauty and power of linear algebra will be seen more clearly when 
you view M /? as only one of a variety of vector spaces that arise naturally in applied 
problems. Actually, a study of vector spaces is not much different from a study of W * 1 
itself, because you can use your geometric experience with M 2 and M 3 4 5 6 7 8 9 10 to visualize many 
general concepts. 

Beginning with basic definitions in Section 4.1, the general vector space framework 
develops gradually throughout the chapter. A goal of Sections 43-4.5 is to demonstrate 
how closely other vector spaces resemble M 7? . Section 4.6 on rank is one of the high 
points of the chapter, using vector space terminology to tie together important facts about 
rectangular matrices. Section 4.8 applies the theory of the chapter to discrete signals and 
difference equations used in digital control systems such as in the space shuttle. Markov 
chains, in Section 4.9, provide a change of pace from the more theoretical sections of 
the chapter and make good examples for concepts to be introduced in Chapter 5. 


4.1 VECTOR SPACES AND SUBSPACES 

Much of the theory in Chapters 1 and 2 rested on certain simple and obvious alge¬ 
braic properties of R", listed in Section 1.3. In fact, many other mathematical systems 
have the same properties. The specific properties of interest are listed in the following 
definition. 

DEFINITION A vector space is a nonempty set V of objects, called vectors , on which are de¬ 
fined two operations, called addition and multiplication by scalars (real numbers), 
subject to the ten axioms (or rules) listed below. 1 The axioms must hold for all 
vectors u, v, and w in V and for all scalars c and d . 

1. The sum of u and v, denoted by u + v, is in V. 

2. u + v = v T u. 

3. (u T v) T w = u T (v T w). 

4. There is a zero vector 0 in V such that u + 0 = u. 

5. For each u in V, there is a vector —u in V such that u + (—u) = 0. 

6. The scalar multiple of u by c , denoted by cu, is in V . 

7. c(u + v) = cu 4- c\. 

8. (c + d )u — c u + d u. 

9. c(d u) = (cd) u. 

10. lu = u. 


1 Technically, V is a real vector space. All of the theory in this chapter also holds for a complex vector space 
in which the scalars are complex numbers. We will look at this briefly in Chapter 5. Until then, all scalars are 
assumed to be real. 
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Using only these axioms, one can show that the zero vector in Axiom 4 is unique, 
and the vector —u, called the negative of u, in Axiom 5 is unique for each u in K. 
See Exercises 25 and 26. Proofs of the following simple facts are also outlined in the 
exercises: 


For each u in V and scalar c , 





Ou = 

= 0 

(i) 


cO = 

= 0 

(2) 


—u = 

= (-l)u 

(3) 



FIGURE 1 


EXAMPLE 1 The spaces M”, where n > 1, are the premier examples of vector 
spaces. The geometric intuition developed for M 3 will help you understand and visualize 
many concepts throughout the chapter. ■ 

EXAMPLE 2 Let V be the set of all arrows (directed line segments) in three- 
dimensional space, with two arrows regarded as equal if they have the same length and 
point in the same direction. Define addition by the parallelogram rule (from Section 1.3), 
and for each v in V, define c\ to be the arrow whose length is \c\ times the length of 
v, pointing in the same direction as v if c > 0 and otherwise pointing in the opposite 
direction. (See Figure 1.) Show that V is a vector space. This space is a common model 
in physical problems for various forces. 

SOLUTION The definition of V is geometric, using concepts of length and direction. 
No xyz- -coordinate system is involved. An arrow of zero length is a single point and 
represents the zero vector. The negative of v is (—l)v. So Axioms 1,4, 5, 6, and 10 are 
evident. The rest are verified by geometry. For instance, see Figures 2 and 3. ■ 



FIGURE 2 u + v = v + u. 



FIGURE 3 (u + v) + w = u + (v + w). 


EXAMPLE 3 Let S be the space of all doubly infinite sequences of numbers (usually 
written in a row rather than a column): 

{y k } = 

If {zk} is another element of S, then the sum {yk} + {Zk} is the sequence {yk + Zk} 
formed by adding corresponding terms of {yk} and {zk}- The scalar multiple c {yk} is 
the sequence {cyk}- The vector space axioms are verified in the same way as for M /? . 

Elements of S arise in engineering, for example, whenever a signal is measured (or 
sampled) at discrete times. A signal might be electrical, mechanical, optical, and so on. 
The major control systems for the space shuttle, mentioned in the chapter introduction, 
use discrete (or digital) signals. For convenience, we will call § the space of (discrete¬ 
time) signals. A signal may be visualized by a graph as in Figure 4. ■ 
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FIGURE 4 A discrete-time signal. 


EXAMPLE 4 For n > 0, the set P /7 of polynomials of degree at most n consists of 
all polynomials of the form 

p(0 = ClQ + (2\t + + • • • + ci n t n (4) 

where the coefficients ao,... ,a n and the variable t are real numbers. The degree of 
p is the highest power of t in (4) whose coefficient is not zero. If p(7) = ao ^ 0, the 
degree of p is zero. If all the coefficients are zero, p is called the zero polynomial. The 
zero polynomial is included in P 77 even though its degree, for technical reasons, is not 
defined. 

If p is given by (4) and if q (t) = bo + b\t + • • • + b n t n , then the sum p + q is 
defined by 

(p + q)(0 = p( 0 + q(0 

= («o + bo) + ( a\ + b\)t + • • • + (a n + b n )t" 

The scalar multiple cp is the polynomial defined by 

(cp)(0 = cpO) = cao + (ca\)t + • • • + ( ca n )t n 

These definitions satisfy Axioms 1 and 6 because p + q and cp are polynomials 
of degree less than or equal to n. Axioms 2, 3, and 7-10 follow from properties of the 
real numbers. Clearly, the zero polynomial acts as the zero vector in Axiom 4. Finally, 
(—l)p acts as the negative of p, so Axiom 5 is satisfied. Thus P 77 is a vector space. 

The vector spaces P„ for various n are used, for instance, in statistical trend analysis 
of data, discussed in Section 6.8. ■ 


• f + g 


f« 

• g 

0 

FIGURE 5 

The sum of two vectors 
(functions). 


EXAMPLE 5 Let V be the set of all real-valued functions defined on a set D . (Typi- 
cally, D is the set of real numbers or some interval on the real line.) Functions are added 
in the usual way: f + g is the function whose value at t in the domain D is f (t) + g (t). 
Likewise, for a scalar c and an f in V, the scalar multiple c f is the function whose value 
at t is cf(t). For instance, if D = R, f(7) = 1 + sin2L and g (t) = 2 + .5t, then 

(f + g)(0 = 3 + sin2 1 + ,5t and (2g )(t) = 4 + t 

Two functions in V are equal if and only if their values are equal for every t in 
D . Hence the zero vector in V is the function that is identically zero, f(7) = 0 for all t , 
and the negative of f is (— l)f. Axioms 1 and 6 are obviously true, and the other axioms 
follow from properties of the real numbers, so V is a vector space. ■ 

It is important to think of each function in the vector space V of Example 5 as a 
single object, as just one “point” or vector in the vector space. The sum of two vectors f 
and g (functions in V , or elements of any vector space) can be visualized as in Figure 5, 
because this can help you carry over to a general vector space the geometric intuition 
you have developed while working with the vector space R 77 . See the Study Guide for 
help as you learn to adopt this more general point of view. 
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DEFINITION 



FIGURE 6 

A subspace of V . 



FIGURE 7 

The x 1 X 2 -plane as a subspace of 

P 3 . 


Subspaces 

In many problems, a vector space consists of an appropriate subset of vectors from some 
larger vector space. In this case, only three of the ten vector space axioms need to be 
checked; the rest are automatically satisfied. 


A subspace of a vector space V is a subset H of V that has three properties: 

a. The zero vector of V is in H ? 

b. H is closed under vector addition. That is, for each u and v in H, the sum 
u + v is in H . 

c. H is closed under multiplication by scalars. That is, for each u in H and each 
scalar c, the vector cu is in H . 


Properties (a), (b), and (c) guarantee that a subspace H of V is itself a vector 
space , under the vector space operations already defined in V. To verify this, note 
that properties (a), (b), and (c) are Axioms 1,4, and 6. Axioms 2, 3, and 7-10 are 
automatically true in H because they apply to all elements of V , including those in H. 
Axiom 5 is also true in H , because if u is in H , then (— l)u is in H by property (c), and 
we know from equation (3) earlier in this section that (— l)u is the vector — u in Axiom 5. 

So every subspace is a vector space. Conversely, every vector space is a subspace 
(of itself and possibly of other larger spaces). The term subspace is used when at least 
two vector spaces are in mind, with one inside the other, and the phrase sub space of V 
identifies V as the larger space. (See Figure 6.) 

EXAMPLE 6 The set consisting of only the zero vector in a vector space V is a 
subspace of V , called the zero subspace and written as {0}. ■ 


EXAM PLE 7 Let P be the set of all polynomials with real coefficients, with opera¬ 
tions in P defined as for functions. Then P is a subspace of the space of all real-valued 
functions defined on P. Also, for each n > 0, P„ is a subspace of P, because P„ is a 
subset of P that contains the zero polynomial, the sum of two polynomials in P„ is also 
in P„ , and a scalar multiple of a polynomial in P„ is also in P„. ■ 


EXAMPLE 8 The vector space P 2 is not a subspace of P 3 because P 2 is not even a 
subset of P 3 . (The vectors in P 3 all have three entries, whereas the vectors in P 2 have 
only two.) The set 




: s and t are real 


is a subset of P 3 that “looks” and “acts” like P 2 , although it is logically distinct from 
P 2 . See Figure 7. Show that H is a subspace of P 3 . 


SOLUTION The zero vector is in H, and H is closed under vector addition and scalar 
multiplication because these operations on vectors in H always produce vectors whose 
third entries are zero (and so belong to H ). Thus H is a subspace of P 3 . ■ 


2 Some texts replace property (a) in this definition by the assumption that H is nonempty. Then (a) could be 

deduced from (c) and the fact that Ou = 0. But the best way to test for a subspace is to look first for the zero 
vector. If 0 is in H, then properties (b) and (c) must be checked. If 0 is not in H, then H cannot be a 
subspace and the other properties need not be checked. 
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H 


FIGURE 8 

A line that is not a vector space. 


EXAMPLE 9 A plane in R 3 not through the origin is not a subspace of R 3 , because 
the plane does not contain the zero vector of R 3 . Similarly, a line in R 2 not through the 
origin, such as in Figure 8, is not a subspace of R 2 . ■ 

A Subspace Spanned by a Set 

The next example illustrates one of the most common ways of describing a subspace. 
As in Chapter 1, the term linear combination refers to any sum of scalar multiples of 
vectors, and Span {vi,..., v^} denotes the set of all vectors that can be written as linear 
combinations of Vi,..., \ p . 

EXAMPLE 10 Given Vi and \2 in a vector space V, let H = Span {vi, V 2 }. Show 
that H is a subspace of V . 

SOLUTION The zero vector is in H , since 0 = Ovi + OV 2 . To show that H is closed 
under vector addition, take two arbitrary vectors in //, say, 

u = s\\\ + S 2\2 and w = t\\\ + ^2 


By Axioms 2, 3, and 8 for the vector space V, 

u + W = O1V1 + s 2 \ 2 ) + (Av 1 + t 2 \ 2 ) 

= Oi + h)\ 1 + O2 + t 2 )V2 



So u + w is in H. Furthermore, if c is any scalar, then by Axioms 7 and 9, 

CU = cOiVi + S 2 \2) = (csi)\i + (cs 2 )\2 



FIGURE 9 

An example of a subspace. 


which shows that cu is in H and H is closed under scalar multiplication. Thus H is a 
subspace of V . ■ 

In Section 4.5, you will see that every nonzero subspace of M. 3 , other than M 3 itself, 
is either Span {vi , v 2 } for some linearly independent Vi and V2 or Span {v} for v ^ 0 . In 
the first case, the subspace is a plane through the origin; in the second case, it is a line 
through the origin. (See Figure 9.) It is helpful to keep these geometric pictures in mind, 
even for an abstract vector space. 

The argument in Example 10 can easily be generalized to prove the following 
theorem. 


THEOREM 1 If Vi,..., Vp are in a vector space V, then Span {vi,..., v^} is a subspace of V. 


We call Span {vi, ..., v p } the subspace spanned (or generated) by {vi, ..., v^}. 
Given any subspace H of V, a spanning (or generating) set for H is a set {vi ,..., v^} 
in H such that H = Span {vi ,..., v^}. 

The next example shows how to use Theorem 1. 

EXAMPLE 11 Let H be the set of all vectors of the form (<a — 3 b,b — a,a,b ), 
where a and b are arbitrary scalars. That is, let H = {{a — 3b , b — a , a , b) : a and b in 
R}. Show that H is a subspace of R 4 . 

SOLUTION Write the vectors in H as column vectors. Then an arbitrary vector in H 
has the form 
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a — 3b 


1 " 


-3 

b — a 


-1 

+ b 

1 


= a 

1 

0 

a 


b 


0 


1 


t t 

Vi v 2 


This calculation shows that H = Span {v\, y 2 }, where Vi and v 2 are the vectors indicated 
above. Thus H is a subspace of R 4 by Theorem 1. ■ 

Example 11 illustrates a useful technique of expressing a subspace H as the set 
of linear combinations of some small collection of vectors. If H — Span{vi,..., v^}, 
we can think of the vectors \\,... ,\ p in the spanning set as “handles” that allow us to 
hold on to the subspace H. Calculations with the infinitely many vectors in H are often 
reduced to operations with the finite number of vectors in the spanning set. 

EXAMPLE 12 For what value(s) of h will y be in the subspace of R 3 spanned by 

vi,v 2 ,v 3 ,if 



r 


5 


-3 


-4 

Vi = 

-1 

-2 

, V 2 = 

1 

^ -L 

1 _ 

, V 3 = 

1 

0 

, and y = 

3 

h 


SOLUTION This question is Practice Problem 2 in Section 1.3, written here with 
the term subspace rather than Span {vi, v 2 , v 3 }. The solution there shows that y is in 
Span {vi, v 2 , V 3 } if and only if h = 5. That solution is worth reviewing now, along with 
Exercises 11-16 and 19-21 in Section 1.3. ■ 

Although many vector spaces in this chapter will be subspaces of R 7Z , it is important 
to keep in mind that the abstract theory applies to other vector spaces as well. Vector 
spaces of functions arise in many applications, and they will receive more attention later. 


PRACTICE PROBLEMS 


WEB 


1. Show that the set H of all points in R 2 of the form (3s, 2 + 5s) is not a vector space, 
by showing that it is not closed under scalar multiplication. (Find a specific vector u 
in H and a scalar c such that c u is not in H .) 

2. Let W = Span {vi,..., v p }, where \\,... ,\ p are in a vector space V. Show that 

is in IT for 1 < k < p. [Hint: First write an equation that shows that Vi is in IT. 
Then adjust your notation for the general case.] 

3. An n x n matrix A is said to be symmetric if A r = A. Let S be the set of all 3 x 3 
symmetric matrices. Show that S is a subspace of M 3 x 3 , the vector space of 3 x 3 
matrices. 


4.1 EXERCISES 


1. Let V be the first quadrant in the xy -plane; that is, let 

x 

a. If u and v are in V , is u + v in T? Why? 

b. Find a specific vector u in V and a specific scalar c such 




that cu is not in T. (This is enough to show that V is not 
a vector space.) 

2. Let IT be the union of the first and third quadrants in the xy- 
plane. That is, let IT = 

a. If u is in IT and c is any scalar, is cu in IT? Why? 
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b. Find specific vectors u and v in W such that u + v is 
not in W . This is enough to show that W is not a vector 
space. 



Let H be the set of points inside and on the unit circle in 


the xy-plane. That is, let H = 


x 

y 


: x 2 + y 2 < 1 >. Find 


a specific example—two vectors or a vector and a scalar—to 
show that H is not a subspace of E 2 . 


4. Construct a geometric figure that illustrates why a line in E 2 
not through the origin is not closed under vector addition. 


In Exercises 5-8, determine if the given set is a subspace of P„ for 
an appropriate value of n. Justify your answers. 


5. All polynomials of the form p (t) = at 2 , where a is in R. 


6. All polynomials of the form p(7) = a + t 2 , where a is in R 


7. All polynomials of degree at most 3, with integers as coeffi¬ 
cients. 


8. All polynomials in P„ such that p(0) = 0. 


9. Let H be the set of all vectors of the form 


3s 

2s 


. Find a 


vector v in E 3 such that H = Span {v}. Why does this show 
that H is a subspace of E 3 ? 


10. Let H be the set of all vectors of the form 


It 

0 

-t 


. Show that 


H is a subspace of E 3 . (Use the method of Exercise 9.) 


11. Let W be the set of all vectors of the form 


5b + 2c 
b 


c 


where b and c are arbitrary. Find vectors u and v such that 
W = Span {u, v}. Why does this show that IF is a subspace 
of E 3 ? 


s -j- 3t 



Let IF be the set of all vectors of the form 

Show that IF is a subspace of E 4 . (Use the 
Exercise 11.) 


s — t 
2s -t ' 
At 

method of 



r 


"2" 


"4" 


"3" 

13. Let Vj = 

0 

-1 

, v 2 = 

1 

3 

,v 3 = 

2 

6 

, and w = 

1 

2 


a. Is w in {vi, v 2 , v 3 }? How many vectors are in {vi, v 2 , v 3 }? 

b. How many vectors are in Span {vi, v 2 , v 3 }? 

c. Is w in the subspace spanned by {vi, v 2 , v 3 }? Why? 


14. Let Vi, v 2 , v 3 be as in Exercise 13, and let w = 
the subspace spanned by {vi, v 2 , v 3 }? Why? 



. Is w in 


In Exercises 15-18, let IF be the set of all vectors of the form 
shown, where a,b , and c represent arbitrary real numbers. In each 
case, either find a set S of vectors that spans IF or give an example 
to show that IF is not a vector space. 



3a + b 


— a + 1 

15. 

4 

16. 

a — 6b 


a — 5b 



2b + a 


a — b 


4 a + 3b 

17. 

b — c 

c — a 

18. 

0 

a + b + c 


b 


c — 2a 


19. If a mass m is placed at the end of a spring, and if the mass is 
pulled downward and released, the mass-spring system will 
begin to oscillate. The displacement y of the mass from its 
resting position is given by a function of the form 

y(t) = C\ cos cot + c 2 sin cot (5) 

where is a constant that depends on the spring and the mass. 
(See the figure below.) Show that the set of all functions 
described in (5) (with co fixed and C \, c 2 arbitrary) is a vector 
space. 



20. The set of all continuous real-valued functions defined on a 
closed interval [a, b] in E is denoted by C[a,b\. This set is 
a subspace of the vector space of all real-valued functions 
defined on [a, b\. 

a. What facts about continuous functions should be proved 
in order to demonstrate that C [a, b] is indeed a subspace 
as claimed? (These facts are usually discussed in a calcu¬ 
lus class.) 

b. Show that {f in C[a,b\ : i{a) = f(6)} is a subspace of 
C[a, b]. 

For fixed positive integers m and n, the set M mXn of all m x n 
matrices is a vector space, under the usual operations of addition 
of matrices and multiplication by real scalars. 

a b 

21. Determine if the set H of all matrices of the form _ , 

0 a 

is a subspace of M 2x2 . 

22. Let F be a fixed 3x2 matrix, and let H be the set of all 
matrices A in M 2x4 with the property that FA = 0 (the zero 
matrix in M 3x4 ). Determine if H is a subspace of M 2x4 . 
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In Exercises 23 and 24, mark each statement True or False. Justify 
each answer. 

23. a. If f is a function in the vector space V of all real-valued 

functions on M. and if f(t) = 0 for some t, then f is the 
zero vector in V. 

b. A vector is an arrow in three-dimensional space. 

c. A subset H of a vector space V is a subspace of V if the 
zero vector is in H. 

d. A subspace is also a vector space. 

e. Analog signals are used in the major control systems for 
the space shuttle, mentioned in the introduction to the 
chapter. 

24. a. A vector is any element of a vector space. 

b. If u is a vector in a vector space V , then (— 1 )u is the same 
as the negative of u. 

c. A vector space is also a subspace. 

d. M 2 is a subspace of M 3 . 

e. A subset H of a vector space V is a subspace of V if the 
following conditions are satisfied: (i) the zero vector of V 
is in H , (ii) u, v, and u + v are in H, and (iii) c is a scalar 
and c u is in H. 

Exercises 25-29 show how the axioms for a vector space V can 
be used to prove the elementary properties described after the 
definition of a vector space. Fill in the blanks with the appropriate 
axiom numbers. Because of Axiom 2, Axioms 4 and 5 imply, 
respectively, that 0 + u = u and — u + u = 0 for all u. 

25. Complete the following proof that the zero vector is 

unique. Suppose that w in V has the property that 

u + w = w + u = u for all u in V. In particular, 0 + w = 0. 

But 0 + w = w, by Axiom_Hence w = 0 + w = 0. 

26. Complete the following proof that —u is the unique vec¬ 
tor in V such that u + (— u) = 0. Suppose that w satisfies 


u + w = 0. Adding — u to both sides, we have 
(-u) + [u + w] = (-u) + 0 

[(— u) + u] + w = (— u) + 0 by Axiom_(a) 

0 + w = (—u) + 0 by Axiom_(b) 

w = — u by Axiom_(c) 

27. Fill in the missing axiom numbers in the following proof that 
Ou = 0 for every u in V. 

Ou = (0 + 0)u = Ou + Ou by Axiom_(a) 

Add the negative of Ou to both sides: 

Ou + (-0u) = [Ou + Ou] + (-0u) 

Ou + (— Ou) = Ou + [Ou + (— 0u)] by Axiom_(b) 

0 — Ou 4- 0 by Axiom_(c) 

0 = Ou by Axiom_(d) 


28. Fill in the missing axiom numbers in the following proof that 
cO = 0 for every scalar c. 


cO = c(0 + 0) 

by Axiom 

(a) 

= cO + cO 

by Axiom 

(b) 

Add the negative of cO to both sides: 



cO + (— cO) = [cO + cO] + (— cO) 



cO + (— cO) = cO + [cO + (— cO)] 

by Axiom 

(c) 

0 = c0 + 0 

by Axiom 

(d) 

0 = cO 

by Axiom 

(e) 

29. Prove that (—l)u = —u. [Hint: Show that u + (—l)u 

= 0 . 


Use some axioms and the results of Exercises 26 and 27.] 

30. Suppose cu = 0 for some nonzero scalar c . Show that u = 0. 
Mention the axioms or properties you use. 

31. Fet u and v be vectors in a vector space V, and let H be any 
subspace of V that contains both u and v. Explain why H 
also contains Span {u, v}. This shows that Span {u, v} is the 
smallest subspace of V that contains both u and v. 

32. Fet H and K be subspaces of a vector space V . The intersec¬ 
tion of H and K, written as H D K, is the set of v in V that 
belong to both H and K. Show that H D K is a subspace of 
V. (See the figure.) Give an example in IR 2 to show that the 
union of two subspaces is not, in general, a subspace. 



33. Given subspaces H and K of a vector space V, the sum of 
H and K , written as H + K, is the set of all vectors in V 
that can be written as the sum of two vectors, one in H and 
the other in K ; that is, 

H + K = {w : w = u + v for some u in H 

and some v in K } 

a. Show that H + K is a subspace of V. 

b. Show that H is a subspace of H + K and K is a subspace 
of H + K. 

34. Suppose U\,... ,u p and \\,... ,\ q are vectors in a vector 
space V, and let 

H = Span {ui,..., Uj,} and K = Span {vi,..., } 

Show that H + K = Span(ui,..., u^, Vi,..., v^}. 
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35. [M] Show that w is in the subspace of IR 4 spanned by 
Vi, v 2 , v 3 , where 



36. [M] Determine if y is in the subspace of M 4 spanned by the 
columns of A , where 


1 

1 _ 


3 

-5 

-9" 

00 

, A = 

00 

7 

-6 

6 

-5 

00 

3 

1 

Oi 

1 _ 


<N 

_1 

-2 

-9 


37. [M] The vector space H — Span {1, cos 2 1, cos 4 1, cos 6 1} 
contains at least two interesting functions that will be used 


in a later exercise: 

f(0 = 1 — 8 cos 2 1 + 8 cos 4 1 

g(0 = —1 + 18 cos 2 1 — 48 cos 4 1 + 32 cos 6 1 

Study the graph of f for 0 < t < 2 jv, and guess a simple for¬ 
mula for f(t). Verify your conjecture by graphing the differ¬ 
ence between 1 + f (t) and your formula for f (t). (Hopefully, 
you will see the constant function 1.) Repeat for g. 

38. [M] Repeat Exercise 37 for the functions 

f(0 = 3 sin t — 4 sin 3 1 
g (t) = 1 — 8 sin 2 t + 8 sin 4 t 
h(r) = 5 sin t — 20 sin 3 1 + 16 sin 5 1 

r\ n 

in the vector space Span {1, sin t , sin t, , sin t }. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Take any u in H — say, u 


3 

7 


and take any c ^ 1—say, c = 2. Then 


c u 


6 

14 


. If this is in H , then there is some s such that 


3s 


6 " 

2 + 5s 


14 


That is, s = 2 and s = 12/5, which is impossible. So 2u is not in H and H is not a 
vector space. 

2. \\ = 1 Vi + 0v2 + • • • + Ov p . This expresses Vi as a linear combination of 
Vi,..., Vp, so Vi is in W. In general, is in W because 


V*; = Ovi H-b Ov£_i + lvjt + (h+ + i H-b Ov^ 

3. The subset S is a subspace of M 3x 3 since it satisfies all three of the requirements 

listed in the definition of a subspace: 

T 

a. Observe that the 0 in M 3x3 is the 3x3 zero matrix and since 0 =0, the matrix 
0 is symmetric and hence 0 is in S . 

b. Let A and B in S . Notice that A and B are 3x3 symmetric matrices so A T = A 
and B t = B . By the properties of transposes of matrices, (A + B) T = A T + 
B T = A + B . Thus A + B is symmetric and hence A + B is in S . 

c. Let A be in S and let c be a scalar. Since A is symmetric, by the properties of 
symmetric matrices, ( cA) T = c(A T ) = cA. Thus cA is also a symmetric matrix 
and hence c A is in S . 


4.2 NULL SPACES, COLUMN SPACES, AND LINEAR TRANSFORMATIONS 


In applications of linear algebra, subspaces of M 7? usually arise in one of two ways: 
(1) as the set of all solutions to a system of homogeneous linear equations or (2) as the 
set of all linear combinations of certain specified vectors. In this section, we compare 
and contrast these two descriptions of subspaces, allowing us to practice using the 
concept of a subspace. Actually, as you will soon discover, we have been working with 
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DEFINITION 


THEOREM 2 


subspaces ever since Section 1.3. The main new feature here is the terminology. The 
section concludes with a discussion of the kernel and range of a linear transformation. 


The Null Space of a Matrix 


Consider the following system of homogeneous equations: 


X\ — 3X2 — 2X3 = 0 

—5x i + 9 x 2 + X 3 = 0 



In matrix form, this system is written as Ax = 0, where 




Recall that the set of all x that satisfy (1) is called the solution set of the system (1). 
Often it is convenient to relate this set directly to the matrix A and the equation Ax = 0. 
We call the set of x that satisfy Ax = 0 the null space of the matrix A. 


The null space of an m x n matrix A , written as Nul A , is the set of all solutions 
of the homogeneous equation Ax = 0. In set notation, 


Nul A = 




A more dynamic description of Nul A is the set of all x in W 1 that are mapped into 
the zero vector of W n via the linear transformation x i-> Ax. See Figure 1. 



EXAM PLE 1 Let A be the matrix in ( 2 ) above, and let u = 
u belongs to the null space of A. 

SOLUTION To test if u satisfies An = 0, simply compute 



. Determine if 



3 

- 2 " 


5 

3 

-2 


5- 9 + 4" 


" 0 " 

9 

1 



-25 + 27-2 


0 


Thus u is in Nul A . 


The term space in null space is appropriate because the null space of a matrix is a 
vector space, as shown in the next theorem. 


The null space of an m x n matrix A is a subspace of M /? . Equivalently, the set of all 
solutions to a system Ax = 0 of m homogeneous linear equations in n unknowns 
is a subspace of W 1 . 
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PROOF Certainly Nul A is a subset of R 77 because A has n columns. We must show 
that Nul A satisfies the three properties of a subspace. Of course, 0 is in Nul A. Next, let 
u and v represent any two vectors in Nul A. Then 

Au = 0 and A\ = 0 

To show that u + v is in Nul A, we must show that v4(u + v) = 0. Using a property of 
matrix multiplication, compute 

A(u + v) = Au + Av = 0 + 0 = 0 

Thus u + v is in Nul A, and Nul A is closed under vector addition. Finally, if c is any 
scalar, then 

A(cu) = c(Au) = c(0) = 0 

which shows that c u is in Nul A . Thus Nul A is a subspace of R 77 . ■ 

EXAMPLE 2 Let H be the set of all vectors in R 4 whose coordinates a , b, c , d 
satisfy the equations a — 2b + 5c = d and c — a = b. Show that H is a subspace of 

R 4 . 

SOLUTION Rearrange the equations that describe the elements of H , and note that H 
is the set of all solutions of the following system of homogeneous linear equations: 

a — 2b + 5c — d = 0 
— a — b + c =0 

By Theorem 2, H is a subspace of R 4 . ■ 

It is important that the linear equations defining the set H are homogeneous. 
Otherwise, the set of solutions will definitely not be a subspace (because the zero vector 
is not a solution of a nonhomogeneous system). Also, in some cases, the set of solutions 
could be empty. 


An Explicit Description of Nul A 

There is no obvious relation between vectors in Nul A and the entries in A . We say that 
NulA is defined implicitly , because it is defined by a condition that must be checked. 
No explicit list or description of the elements in Nul A is given. However, solving 
the equation Ax = 0 amounts to producing an explicit description of Nul A . The next 
example reviews the procedure from Section 1.5. 


EXAMPLE 3 Find a spanning set for the null space of the matrix 



1 

3 

8 



SOLUTION The first step is to find the general solution of Ax = 0 in terms of free 
variables. Row reduce the augmented matrix [A 0 ] to reduced echelon form in order 
to write the basic variables in terms of the free variables: 


1 - 20-130 
0012-20 
0 0 0 0 0 0 


X\ — 2X2 — X 4 + 3X5 = 0 

X3 + 2x4 — 2x5 = 0 

0 = 0 
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The general solution is x\ = 2x2 + X4 — 3xs, *3 = —2x4 + 2 x 5 , with X2, X4, and X5 
free. Next, decompose the vector giving the general solution into a linear combination 
of vectors where the weights are the free variables. That is, 


X\ 


2x2 + X4 — 3x 5 


"2" 


r 


-3 

x 2 


*2 


1 


0 


0 

*3 

— 

—2X4 + 2X5 

= x 2 

0 

+ X4 

-2 

+ %5 

2 

X4 


X4 


0 


1 


0 

x 5 _ 


*5 


0 


0 


1 


t t t 

U V w 

= X 2 U + X 4 V + X 5 W (3) 

Every linear combination of u, v, and w is an element of Nul A and vice versa. Thus 
{u, v, w} is a spanning set for Nul A . ■ 

Two points should be made about the solution of Example 3 that apply to all 
problems of this type where Nul A contains nonzero vectors. We will use these facts 
later. 

1. The spanning set produced by the method in Example 3 is automatically linearly 
independent because the free variables are the weights on the spanning vectors. For 
instance, look at the 2nd, 4th, and 5th entries in the solution vector in (3) and note 
that X 2 U + X 4 V + X 5 W can be 0 only if the weights X 2 , X 4 , and X 5 are all zero. 

2. When Nul A contains nonzero vectors, the number of vectors in the spanning set for 
Nul A equals the number of free variables in the equation Ax = 0. 


The Column Space of a Matrix 

Another important subspace associated with a matrix is its column space. Unlike the 
null space, the column space is defined explicitly via linear combinations. 


DEFINITION 


The column space of an m x n matrix A , written as Col A , is the set of all linear 
combinations of the columns of A . If A = [ ai • • • a ;7 ], then 

Col A = Span {ai,..., a„} 


Since Span {ai,..., a w }is a subspace, by Theorem 1, the next theorem follows from 
the definition of Col A and the fact that the columns of A are in W n . 


THEOREM 3 



Note that a typical vector in Col A can be written as Ax for some x because the 
notation Ax stands for a linear combination of the columns of A. That is, 



The notation Ax for vectors in Col A also shows that Col A is the range of the linear 
transformation x i-> Ax. We will return to this point of view at the end of the section. 
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EXAMPLE 4 Find a matrix A such that W = Col A . 


W 


6a — b 
a + b 
—la 


: a , b in M 


SOLUTION First, write W as a set of linear combinations 


W 


f 

6 


~-l" 

1 ( 

6 


"-I" 

) 

! a 

1 

+ b 

1 

: a , b in M > = Span < 

1 


1 

[ 

l 

-7 


0 

J l 

-7 


0 

1 


Second, use the vectors in the spanning set as the columns of A. Let A = 
Then W = Col A, as desired. 




Recall from Theorem 4 in Section 1.4 that the columns of A span W n if and only if 
the equation Ax = b has a solution for each b. We can restate this fact as follows: 


The column space of an m x n matrix A is all of W n if and only if the equation 
Ax = b has a solution for each b in W n . 


The Contrast Between Nul A and Col A 


It is natural to wonder how the null space and column space of a matrix are related. 
In fact, the two spaces are quite dissimilar, as Examples 5-7 will show. Nevertheless, 
a surprising connection between the null space and column space will emerge in 
Section 4.6, after more theory is available. 


EXAMPLES Let 



a. If the column space of A is a subspace of M /c , what is A:? 

b. If the null space of A is a subspace of M , k , what is A? 


SOLUTION 



The columns of A each have three entries, so Col A is a subspace of M . k , where k 




A vector x such that Ax is defined must have four entries, so Nul A is a subspace of 
where k — 4. ■ 


When a matrix is not square, as in Example 5, the vectors in Nul A and Col A live 
in entirely different “universes.” For example, no linear combination of vectors in M 3 
can produce a vector in M 4 . When A is square, Nul A and Col A do have the zero vector 
in common, and in special cases it is possible that some nonzero vectors belong to both 
Nul A and Col A . 
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EXAMPLE 6 With A as in Example 5, find a nonzero vector in Col A and a nonzero 
vector in Nul A. 


SOLUTION It is easy to find a vector in Col A. Any column of A will do, say, 



To find a nonzero vector in Nul A , row reduce the augmented matrix [ A 


0 ] and obtain 


[A 0] 





Thus, if x satisfies Ax = 0, then x\ = — 9 x 3 , *2 = 5 x 3 , ^4 = 0, and X 3 is free. As¬ 
signing a nonzero value to X 3 —say, X 3 = 1—we obtain a vector in Nul A, namely, 
x = (—9, 5, 1,0). ■ 


EXAMPLE 7 


With A as in Example 5, let u = 



a. Determine if u is in Nul A . Could u be in Col A? 

b. Determine if v is in Col A. Could v be in Nul A? 

SOLUTION 


a. An explicit description of Nul A is not needed here. Simply compute the product Au. 



2 

4 

-2 

r 


3 

-2 

-1 

0 


0 


0 

Au = 

-2 

-5 

7 

3 


— 

-3 

* 

0 


3 

7 

-8 

6 



3 


0 


Obviously, u is not a solution of Ax = 0, so u is not in Nul A. Also, with four entries, 
u could not possibly be in Col A, since Col A is a subspace of R 3 . 

b. Reduce [A v ] to an echelon form. 



<N 

1_ 

4 

-2 

1 

_1 


<N 

1_ 

4 

-2 

1 

_ 1 


-2 

-5 

7 

3 

-1 


0 

1 

-5 

—4 

-2 


3 

7 

00 

6 

1_ 


0 

0 

0 

17 

1 


At this point, it is clear that the equation Ax = v is consistent, so v is in Col A. With 
only three entries, v could not possibly be in Nul A, since Nul A is a subspace of 

R 4 . ■ 

The table on page 206 summarizes what we have learned about Nul A and Col A. 
Item 8 is a restatement of Theorems 11 and 12(a) in Section 1.9. 


Kernel and Range of a Linear Transformation 

Subspaces of vector spaces other than R 77 are often described in terms of a linear 
transformation instead of a matrix. To make this precise, we generalize the definition 
given in Section 1.8. 
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Contrast Between Nul A and Col A for an m x n Matrix A 

Nul A Col A 


1. Nul A is a subspace of R n . 

2. Nul A is implicitly defined; that is, you are 
given only a condition (Ax = 0) that vec¬ 
tors in Nul A must satisfy. 

3. It takes time to find vectors in Nul A. Row 
operations on [ A 0 ] are required. 

4. There is no obvious relation between Nul A 
and the entries in A . 

5. A typical vector v in Nul A has the property 
that A\ = 0. 

6. Given a specific vector v, it is easy to tell if 
v is in Nul A. Just compute Ay. 

7. Nul A = {0} if and only if the equation 
Ax = 0 has only the trivial solution. 

8. Nul A = {0} if and only if the linear trans¬ 
formation x Ax is one-to-one. 


1. Col A is a subspace of R ' n . 

2. Col A is explicitly defined; that is, you are 
told how to build vectors in Col A . 

3. It is easy to find vectors in Col A. The 
columns of A are displayed; others are 
formed from them. 

4. There is an obvious relation between Col A 
and the entries in A , since each column of 
A is in Col A. 

5. A typical vector v in Col A has the property 
that the equation Ax = v is consistent. 

6. Given a specific vector v, it may take time 
to tell if v is in Col A . Row operations on 
[A v ] are required. 

7. Col A = IR m if and only if the equation 
Ax = b has a solution for every b in IR m . 

8. Col A = R m if and only if the linear trans¬ 
formation x f— >- Ax maps R n onto . 


DEFINITION A linear transformation T from a vector space V into a vector space W is a rule 

that assigns to each vector x in V a unique vector T (x) in W , such that 

(i) T(u + v) = T(u) + T (v) for all u, v in V, and 

(ii) T (cu) = c7"(u) for all u in V and all scalars c. 


The kernel (or null space) of such a T is the set of all u in V such that 7(u) = 0 
(the zero vector in W). The range of T is the set of all vectors in W of the form 7"(x) 
for some x in V. If T happens to arise as a matrix transformation—say, T(x) = Ax 
for some matrix A—then the kernel and the range of T are just the null space and the 
column space of A , as defined earlier. 

It is not difficult to show that the kernel of T is a subspace of V. The proof is 
essentially the same as the one for Theorem 2. Also, the range of T is a subspace of W. 
See Figure 2 and Exercise 30. 



FIGURE 2 Subspaces associated with a 
linear transformation. 


In applications, a subspace usually arises as either the kernel or the range of an 
appropriate linear transformation. For instance, the set of all solutions of a homoge¬ 
neous linear differential equation turns out to be the kernel of a linear transformation. 
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Typically, such a linear transformation is described in terms of one or more derivatives 
of a function. To explain this in any detail would take us too far afield at this point. So 
we consider only two examples. The first explains why the operation of differentiation 
is a linear transformation. 

EXAMPLE 8 (Calcul us required) Let V be the vector space of all real-valued func¬ 
tions / defined on an interval [a, b] with the property that they are differentiable and 
their derivatives are continuous functions on [a, b]. Let W be the vector space C[a,b] 
of all continuous functions on [a,b ], and let D : V —W be the transformation that 
changes f in V into its derivative /'.In calculus, two simple differentiation rules are 

D(f + g) = D(f ) + D(g) and D(cf) = cD{f ) 

That is, D is a linear transformation. It can be shown that the kernel of D is the set of 
constant functions on [a , b\ and the range of D is the set W of all continuous functions 
on [a, b]. ■ 

EXAMPLE 9 (Calculus required) The differential equation 

y" + co 2 y= 0 (4) 

where co is a constant, is used to describe a variety of physical systems, such as the 
vibration of a weighted spring, the movement of a pendulum, and the voltage in an 
inductance-capacitance electrical circuit. The set of solutions of (4) is precisely the 
kernel of the linear transformation that maps a function y = f(t ) into the function 
f"(t) + co 2 f (t). Finding an explicit description of this vector space is a problem in 
differential equations. The solution set turns out to be the space described in Exercise 19 
in Section 4.1. ■ 


PRACTICE PROBLEMS 


1. Let W 


a 

b 

c 


: a — 3b — c 



. Show in two different ways that W is a 


subspace of R 3 . (Use two theorems.) 



7 

-3 

5 


2" 


7 

2. Let A = 

-4 

1 

-5 

, v = 

1 

, and w = 

6 


-5 

2 

-4 


-1 


-3 


. Suppose you know that 


the equations v4x = v and v4x = w are both consistent. What can you say about the 
equation v4x = v + w? 

3. Let A be an n x n matrix. If Col A = Nul A, show that Nul v4 2 = R ;? . 


4.2 EXERCISES 



r 



5" 


1. Determine if w = 

3 

-4 

is in Nul A , where 

2. Determine if w = 

-3 

2 

is in Nul A , where 



3 

-5 

-3" 


5 

21 

19" 

A = 

6 

-2 

0 

A = 

13 

23 

2 


-8 

4 

1 


8 

14 

1 

























208 CHAPTER 4 Vector Spaces 


In Exercises 3-6, find an explicit description of Nul A by listing 
vectors that span the null space. 


3. 

A = 

"1 

3 

5 

0" 

_0 

1 

4 

-2 

4. 

A = 

"1 

-6 

4 

0" 

0 

0 

2 

0 



"1 

-2 

0 

4 

5. 

A = 

0 

0 

1 

-9 



_0 

0 

0 

0 



"1 

5 

-4 

-3 

6. 

A = 

0 

1 

-2 

1 



0 

0 

0 

0 


0 

0 

1 


0 
0 

In Exercises 7-14, either use an appropriate theorem to show that 
the given set, IE, is a vector space, or find a specific example to 
the contrary. 


7. 


a 

b 

c 


\ a + b + c = 2 


8 


r 

s 

t 


: 5r — 1 = s + 2t 




9. 


< 




a 



f 

a 

\ 

b 

a — 2b = 4c 



b 

m a + 3b = c 

c 

• 

’ 2a = c + 3d 

► 10. < 


c 

' b + c + a = d 

d 

j 


< 

d 

j 



f 

b — 2d 

> 


r 

b — 5d 

\ 

11. 


5 + d 
b -j - 3 d 

: b, d real 

12. 


2b 

2d + 1 

: b, d real > 


< 

d 

> 


< 

d 

j 


13. 


c — 6d 
d 


c 


: c,d real > 14 


—a + 2b 
a — 2b 
3 a — 6b 


: a, b real 


In Exercises 15 and 16, find A such that the given set is Col A 


15. 




16. 


< 




2s + 3 1 
r + s — 2t 
4 r + s 
3 r — s — t _ 

b — c 

2b 4" c ~\~ d 
5c — Ad 
d 


: r, s, t real 






: b,c,d real 




/ 


For the matrices in Exercises 17-20, (a) find k such that Nul A is 
a subspace of R k , and (b) find k such that Col A is a subspace of 

R*. 


17. A = 


—i 

_1 


7 

-2 

i 

o 

-1 

3 

18. A = 

-2 

0 

-5 

-4 

12 

0 

—5 

7 

3 

-9 


-5 

7 

-2 








19. A = 


4 5 -2 6 0 

11 0 10 


20. A = [1 -3 9 0 -5] 


21. With A as in Exercise 17, find a nonzero vector in Nul A and 
a nonzero vector in Col A . 


22. With A as in Exercise 3, find a nonzero vector in Nul A and 
a nonzero vector in Col A . 


23. Let A = 


6 

3 


12 

6 


and w = 


2 

1 


. Determine if w is in 


Col A. Is w in Nul A? 


24. Let A — 


"-8 

-2 

-9" 


2" 

6 

4 

8 

and w = 

1 

4 

0 

4 


-2 


. Determine if 


w is in Col A. Is w in Nul A1 

In Exercises 25 and 26, A denotes an m x n matrix. Mark each 
statement True or False. Justify each answer. 

25. a. The null space of A is the solution set of the equation 

Ax = 0. 

b. The null space of an m x n matrix is in R m . 

c. The column space of A is the range of the mapping 

X H>AX. 

d. If the equation Ax = b is consistent, then Col A is R m . 

e. The kernel of a linear transformation is a vector space. 

f. Col A is the set of all vectors that can be written as Ax for 
some x. 


26. a. A null space is a vector space. 

b. The column space of an m x n matrix is in R m . 

c. Col A is the set of all solutions of Ax = b. 

d. Nul A is the kernel of the mapping xi Ax. 

e. The range of a linear transformation is a vector space. 

f. The set of all solutions of a homogeneous linear differen¬ 
tial equation is the kernel of a linear transformation. 

27. It can be shown that a solution of the system below is x\ = 3, 
%2 — 2, and x 3 = — 1. Use this fact and the theory from this 
section to explain why another solution is X\ — 30, x 2 = 20, 
and x 3 = —10. (Observe how the solutions are related, but 
make no other calculations.) 

x\ — 3 x 2 — 3x 3 = 0 
— 2x\ T 4x2 T 2x 3 — 0 
—X} T 5x2 T 7x 3 — 0 

28. Consider the following two systems of equations: 


5xi + x 2 — 3x 3 = 0 
—9xi T 2x2 T 5x 3 — 1 
4xi + x 2 — 6x 3 = 9 


5xi + x 2 — 3x 3 = 0 
—9xi T 2x2 T 5x 3 — 5 
4xi + x 2 — 6x 3 = 45 


It can be shown that the first system has a solution. Use 
this fact and the theory from this section to explain why 
the second system must also have a solution. (Make no row 
operations.) 
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29. Prove Theorem 3 as follows: Given an m x n matrix A, an 
element in Col A has the form Ax for some x in W l . Let Ax 
and A w represent any two vectors in Col A . 

a. Explain why the zero vector is in Col A. 

b. Show that the vector Ax + Aw is in Col A. 


c. Given a scalar c , show that c(Ax) is in Col A. 

30. Let T : V -> W be a linear transformation from a vector 
space V into a vector space W . Prove that the range of T is 
a subspace of W . [Hint: Typical elements of the range have 
the form T (x) and T (w) for some x, w in V.] 


31. Define T : P 2 -> R 2 by T(p) = 


33 


p (t) = 3 + 5t + 7? 2 ,then T( p) = 


P(0) 

P(l) 

3' 
15 


. For instance, if 


a. Show that T is a linear transformation. [Hint: For ar¬ 
bitrary polynomials p, q in P 2 , compute T(p + q) and 

T(c p).] 

b. Find a polynomial p in P 2 that spans the kernel of T , and 
describe the range of T . 


32. Define 
T(V) = 


a linear transformation 
p(0) 


T : P 2 -> P 2 by 


P(0) 


. Find polynomials p 1 and p 2 in P 2 that 


span the kernel of T , and describe the range of T . 


Let M 2x2 
and define 


A = 


a 

c 


be the vector space of all 

T : M 2 x 2 ~^ ^ 2 x 2 by T(A) = 
b 

d 


2x2 matrices, 
A + A T , where 


a. Show that T is a linear transformation. 

b. Let B be any element of M 2x2 such that B T = B . Find 
an A in M 2x2 such that T(A) = B . 

c. Show that the range of T is the set of B in M 2x2 with the 
property that B T = B . 

d. Describe the kernel of T . 


34. (Calculus required) Define T : C[0,1] -> C[0,1] as follows: 
For f in C[0, 1], let T(f) be the antiderivative F of f such 
that F(0) = 0. Show that T is a linear transformation, and 
describe the kernel of T . (See the notation in Exercise 20 of 
Section 4.1.) 


35. Let V and W be vector spaces, and let T : V -> W be a linear 
transformation. Given a subspace U of F, let T(U) denote 
the set of all images of the form T (x) , where x is in U. Show 
that T(U) is a subspace of W. 

36. Given T : V -> W as in Exercise 35, and given a subspace 
Z of W , let U be the set of all x in V such that T (x) is in Z . 
Show that U is a subspace of V . 


37. [M] Determine whether w is in the column space of A, the 
null space of A, or both, where 


1 

_1 


7 

6 

-4 

1 

1 

, A = 

-5 

-1 

0 

-2 

-1 

9 

-11 

7 

-3 

1 

1 _ 


0 

_ 1 

-9 

7 

1 


38. [M] Determine whether w is in the column space of A, the 
null space of A, or both, where 


1 

_ 1 


OO 

1_ 

5 

-2 

1 

0 

2 

, A = 

-5 

2 

1 

-2 

1 

10 

OO 

6 

-3 

1 

0 

1 _ 


3 

-2 

1 

1 

0 


39. [M] Let ai,..., a 5 denote the columns of the matrix A , where 



B = [ ai a 2 a 4 ] 


a. Explain why a 3 and a 5 are in the column space of B . 

b. Find a set of vectors that spans Nul A. 

c. Let T : E 5 -> P 4 be defined by T (x) = Ax. Explain why 
T is neither one-to-one nor onto. 


40. [M] Let H = Span {vi, v 2 } and K = Span {v 3 , v 4 }, where 



"5" 


1 


2" 


0 

Vi = 

3 

8 

A 2 = 

1 

1 _ 

, V 3 = 

-1 

5 

,v 4 = 

1 

<N OO 

_1 


Then H and K are subspaces of P 3 . In fact, H and K 
are planes in R 3 through the origin, and they intersect 
in a line through 0. Find a nonzero vector w that gen¬ 
erates that line. [Hint: w can be written as c\\\ + c 2 v 2 
and also as c 3 v 3 + c 4 v 4 . To build w, solve the equation 
C\\\ + c 2 v 2 = c 3 v 3 + c 4 v 4 for the unknown Cj ’s.] 



Mastering: Vector Space, Subspace, 
Col A, and Nul A 4-6 


SOLUTIONS TO PRACTICE PROBLEMS 

1. First method: W is a subspace of M 3 by Theorem 2 because W is the set of all solu¬ 
tions to a system of homogeneous linear equations (where the system has only one 
equation). Equivalently, W is the null space of the 1x3 matrix A = [ 1 —3 — 1 ]. 
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Second method: Solve the equation a — 3b — c 


= 0 for the leading variable a in 

3 b + c 


terms of the free variables b and c . Any solution has the form 
and c are arbitrary, and 



, where b 


3 b + c 


3 


T 

b 

= b 

1 

+ c 

0 

c 


0 


1 


t t 

vi v 2 


This calculation shows that W = Span{vi,V 2 }. Thus IT is a subspace of M 3 by 
Theorem 1. We could also solve the equation a — 3b — c = 0 for b or c and get 
alternative descriptions of IT as a set of linear combinations of two vectors. 


2. Both v and w are in Col A. Since Col A is a vector space, v + w must be in Col A. 
That is, the equation Ax = v + w is consistent. 


3. Let x be any vector in M 7? . Notice Ax is in Col A, since it is a linear combination 
of the columns of A. Since Col A = Nul A , the vector Ax is also in Nul A. Hence 
A 2 x = A(Ax) = 0 establishing that every vector x from W 1 is in Nul A 2 . 


4.3 LINEARLY INDEPENDENT SETS; BASES 


In this section we identify and study the subsets that span a vector space V or a subspace 
H as “efficiently” as possible. The key idea is that of linear independence, defined as 
in W 1 . 

An indexed set of vectors {vi,..., v^} in V is said to be linearly independent if 
the vector equation 

C\\ 1 + c 2 \2 H-f- C p \ p = 0 (1) 

has only the trivial solution, c\ = 0,..., c p = 0. 1 

The set {vi,..., v^} is said to be linearly dependent if (1) has a nontrivial solution, 
that is, if there are some weights, C \,..., c p , not all zero , such that (1) holds. In such a 
case, (1) is called a linear dependence relation among \\,... ,\ p . 

Just as in W l , a set containing a single vector v is linearly independent if and only 
if v ^ 0. Also, a set of two vectors is linearly dependent if and only if one of the vectors 
is a multiple of the other. And any set containing the zero vector is linearly dependent. 
The following theorem has the same proof as Theorem 7 in Section 1.7. 


THEOREM 4 An indexed set {vi, ..., v^} of two or more vectors, with vi ^ 0, is linearly 

dependent if and only if some v 7 (with j > 1) is a linear combination of the 
preceding vectors, Vi,..., v y _i. 


The main difference between linear dependence in W 1 and in a general vector space 
is that when the vectors are not n -tuples, the homogeneous equation (1) usually cannot 
be written as a system of n linear equations. That is, the vectors cannot be made into the 
columns of a matrix A in order to study the equation Ax = 0. We must rely instead on 
the definition of linear dependence and on Theorem 4. 


1 It is convenient to use c\,... ,c p in (1) for the scalars instead of x \,..., x p , as we did in Chapter 1. 
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EXAMPLE 1 Let pjfV) = 1, p 2 (0 = t , and p 3 (0 = 4 — t. Then {Pi,p 2 ,p 3 } is 
linearly dependent in P because p 3 = 4pj — p 2 . ■ 

EXAMPLE 2 The set {sinT cos^} is linearly independent in C[0,1], the space of 
all continuous functions on 0 < t < 1, because sin t and cos t are not multiples of one 
another as vectors in C[ 0,1]. That is, there is no scalar c such that cos t = c • sin t for all 
t in [0,1]. (Look at the graphs of sin t and cos t .) However, {sin t cos t, sin2t} is linearly 
dependent because of the identity: sin 2 1 = 2 sin t cos t , for all t . ■ 


DEFINITION Let H be a subspace of a vector space V. An indexed set of vectors 

B = {bi,..., b^} in V is a basis for H if 

(i) B is a linearly independent set, and 

(ii) the subspace spanned by B coincides with H ; that is, 

H = Span{bi,..., b^} 


The definition of a basis applies to the case when H = V , because any vector space 
is a subspace of itself. Thus a basis of V is a linearly independent set that spans V. 
Observe that when H ^ V, condition (ii) includes the requirement that each of the 
vectors bi,..., b^ must belong to H , because Span {bi,..., b p } contains bi,..., b^, 
as shown in Section 4.1. 


EXAMPLE 3 Let A be an invertible n x n matrix—say, A = [ sl \ • • • a n ]. Then 
the columns of A form a basis for E 77 because they are linearly independent and they 
span E 77 , by the Invertible Matrix Theorem. ■ 





3 



FIGURE 1 

The standard basis for IR 3 . 


EXAMPLE 4 Let ei,..., e n be the columns of the n x n identity matrix, I n . That 



i 

O 

_1 


i 

H- O 
_1 


l- 

o • ■ 

i _ 

ei = 

1 

• • • 

1_ 

, e 2 = 

1 

C'~3 • • • 

1_ 

, • • • ? 

i 

i _ 


The set {ei,..., e 77 } is called the standard basis for E 77 (Figure 1). ■ 



3 


"-4" 


"-2" 


EXAMPLE 5 Let vi = 

0 

-6 

, v 2 = 

1 

7 

, and v 3 = 

1 

5 

. Determine if 


{vi, v 2 , v 3 } is a basis for E 3 . 


SOLUTION Since there are exactly three vectors here in E 3 , we can use any of several 
methods to determine if the matrix A = [\\ v 2 v 3 ] is invertible. For instance, two 
row replacements reveal that A has three pivot positions. Thus A is invertible. As in 
Example 3, the columns of A form a basis for E 3 . ■ 


EXAMPLE 6 Let S = {1, t, t 2 ,..., t n }. Verify that S is a basis for P„. This basis 
is called the standard basis for P„. 

SOLUTION Certainly S spans P„ . To show that S is linearly independent, suppose that 
Cq, ..., c n satisfy 


Cq • 1 + C\t + c 2 ^ 2 + • • • + c n t n — 0(t) 


( 2 ) 
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FIGURE 2 

The standard basis for P? 



THEOREM 5 


This equality means that the polynomial on the left has the same values as the zero 
polynomial on the right. A fundamental theorem in algebra says that the only polynomial 
in P„ with more than n zeros is the zero polynomial. That is, equation (2) holds for all 
t only if Co = • • • = c n = 0. This proves that S is linearly independent and hence is a 
basis for P„ . See Figure 2. ■ 

Problems involving linear independence and spanning in P„ are handled best by a 
technique to be discussed in Section 4.4. 

The Spanning Set Theorem 

As we will see, a basis is an “efficient” spanning set that contains no unnecessary vectors. 
In fact, a basis can be constructed from a spanning set by discarding unneeded vectors. 

EXAMPLE 7 Let 



0 


"2" 


6 


Vi = 

2 

-1 

, v 2 = 

2 

0 

, v 3 = 

1 

Lh On 

1_ 

, and H = Span{vi, v 2 , V3} 


Note that V 3 = 5v 1 + 3 v 2 , and show that Span {vi, V 2 , V 3 } = Span {vi, v 2 }. Then find a 
basis for the subspace H. 

SOLUTION Every vector in Span{vi, V 2 } belongs to H because 

C 1 V 1 + C 2 \ 2 = C\\\ + c 2 \ 2 + 0v 3 

Now let x be any vector in H — say, x = c\\\ + c 2 \ 2 + C 3 V 3 . Since V 3 = 5\\ + 3 v 2 , we 
may substitute 

X = C1V1 + C2V2 + c 3 ( 5 vi + 3 v 2 ) 

= (c\ + 5c 3 )\ l + (c 2 + 3 c 3 )v 2 

Thus x is in Span{vi,V 2 }, so every vector in H already belongs to Span{vi, V 2 }. We 
conclude that H and Span{vi, v 2 } are actually the same set of vectors. It follows that 
{vi, V 2 } is a basis of H since {vi, V 2 } is obviously linearly independent. ■ 

The next theorem generalizes Example 7. 


The Spanning Set Theorem 

Let S = {vi,..., v^} be a set in V, and let H = Span {vi,..., v^}. 

a. If one of the vectors in S — say, — is a linear combination of the remaining 
vectors in S , then the set formed from S by removing v^ still spans H . 

b. If H ^ {0}, some subset of S is a basis for H . 


PROOF 

a. By rearranging the list of vectors in S, if necessary, we may suppose that is a 
linear combination of Vi,..., v^-i — say, 

v p = a x \i 4-E a p -iv p -i (3) 

Given any x in H , we may write 

X = C 1 V 1 4-E c p —\\ p —i + c p \ p (4) 

for suitable scalars C\,... ,c p . Substituting the expression for \ p from (3) into (4), 
it is easy to see that x is a linear combination of Vi ,..., \ p ~\. Thus {vi,..., v^-i} 
spans H , because x was an arbitrary element of H. 
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b. If the original spanning set S is linearly independent, then it is already a basis for H . 
Otherwise, one of the vectors in S depends on the others and can be deleted, by part 
(a). So long as there are two or more vectors in the spanning set, we can repeat this 
process until the spanning set is linearly independent and hence is a basis for H . If 
the spanning set is eventually reduced to one vector, that vector will be nonzero (and 
hence linearly independent) because H ^ {0}. ■ 


Bases for Nul A and Col A 

We already know how to find vectors that span the null space of a matrix A. The 
discussion in Section 4.2 pointed out that our method always produces a linearly 
independent set when Nu l A contains nonzero vectors. So, in this case, that method 
produces a basis for Nul A. 

The next two examples describe a simple algorithm for finding a basis for the 
column space. 


EXAM PLE 8 Find a basis for Col B , where 





4 

0 

0 

0 



0 

0 

1 

0 


SOLUTION Each nonpivot column of B is a linear combination of the pivot columns. 
In fact, b 2 = 4bi and b 4 = 2bi — b 3 . By the Spanning Set Theorem, we may discard 
b 2 and b 4 , and {bi, b 3 , bs} will still span Col B . Let 


S = {b U b 3 ,b 5 } = 



Since bi ^ 0 and no vector in S is a linear combination of the vectors that precede it, 
S is linearly independent (Theorem 4). Thus S is a basis for Col B . ■ 


What about a matrix A that is not in reduced echelon form? Recall that any 
linear dependence relationship among the columns of A can be expressed in the form 
v4x = 0, where x is a column of weights. (If some columns are not involved in a 
particular dependence relation, then their weights are zero.) When A is row reduced 
to a matrix B , the columns of B are often totally different from the columns of A. 
However, the equations v4x = 0 and Bx = 0 have exactly the same set of solutions. If 
A = [ ai • • • a 77 ] and B = [ bi • • • b n . ], then the vector equations 

x\a\ + • • • + x n a n = 0 and x\b\ + • • • + x n b n = 0 

also have the same set of solutions. That is, the columns of A have exactly the same 
linear dependence relationships as the columns of B . 


EXAMPLE 9 It can be shown that the matrix 





0 

1 

1 

2 


2 

5 

3 

8 



is row equivalent to the matrix B in Example 8. Find a basis for Col A . 
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SOLUTION In Example 8 we saw that 

b 2 = 4b i and b 4 = 2b\ — b 3 

so we can expect that 

= 4ai and a 4 = 2ai — a 3 

Check that this is indeed the case! Thus we may discard a 2 and a 4 when selecting 
a minimal spanning set for Co l A. In fact, {ai,a 3 ,as} must be linearly independent 
because any linear dependence relationship among ai, a 3 , as would imply a linear 
dependence relationship among bi , b 3 , bs. But we know that {bi, b 3 , bs} is a linearly 
independent set. Thus {ai, a 3 , as} is a basis for Col A. The columns we have used for 
this basis are the pivot columns of A . ■ 

Examples 8 and 9 illustrate the following useful fact. 


THEOREM 6 


The pivot columns of a matrix A form a basis for Col A . 


PROOF The general proof uses the arguments discussed above. Let B be the reduced 
echelon form of A. The set of pivot columns of B is linearly independent, for no 
vector in the set is a linear combination of the vectors that precede it. Since A is row 
equivalent to B , the pivot columns of A are linearly independent as well, because any 
linear dependence relation among the columns of A corresponds to a linear dependence 
relation among the columns of B . For this same reason, every nonpivot column of A is 
a linear combination of the pivot columns of A . Thus the nonpivot columns of A may 
be discarded from the spanning set for Col A , by the Spanning Set Theorem. This leaves 
the pivot columns of A as a basis for Col A . ■ 

Warning: The pivot columns of a matrix A are evident when A has been reduced only 
to echelon form. But, be careful to use the pivot columns of A itself for the basis of 
Col A. Row operations can change the column space of a matrix. The columns of an 
echelon form B of A are often not in the column space of A. For instance, the columns 
of matrix B in Example 8 all have zeros in their last entries, so they cannot span the 
column space of matrix A in Example 9. 


Two Views of a Basis 

When the Spanning Set Theorem is used, the deletion of vectors from a spanning set 
must stop when the set becomes linearly independent. If an additional vector is deleted, 
it will not be a linear combination of the remaining vectors, and hence the smaller set 
will no longer span V. Thus a basis is a spanning set that is as small as possible. 

A basis is also a linearly independent set that is as large as possible. If S is a basis 
for V, and if S is enlarged by one vector—say, w—from V, then the new set cannot be 
linearly independent, because S spans V, and w is therefore a linear combination of the 
elements in S . 

EXAMPLE 10 The following three sets in M 3 show how a linearly independent set 
can be enlarged to a basis and how further enlargement destroys the linear independence 
of the set. Also, a spanning set can be shrunk to a basis, but further shrinking destroys 
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the spanning property. 


( 

1 


1 

<N 

1_ 

) 

J 

0 

9 

3 

{ 

l 

1 

0 

_1 


1 

0 

_ 1 

1 


Linearly independent 
but does not span R 3 


( 

"1" 


1 

<N 

1_ 
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1 _ 

) 

) 

0 

9 

3 

5 

5 


\ 

0 


0 


1 

os 

1_ 

J 


A basis 
for R 3 


1 

0 

0 


2 

3 

0 


4 

5 

6 


7 

8 
9 


Spans R 3 but is 
linearly dependent 



Mastering: Basis 4-9 


PRACTICE PROBLEMS 



r 


"-2" 

1. Letvi = 

-2 

3 

andv2 = 

7 

-9 


. Determine if {vi, V 2 } is a basis for M 3 . Is {vi, V 2 } 


a basis for M 2 ? 



r 


6 


2" 

2. Let vi = 

-3 

4 

, v 2 = 

2 

-1 

, v 3 = 

-2 

3 


the subspace W spanned by {vi, V2, V3, V4}. 



" 1 " 


0 

1 

\ 

s 

3. Letvi = 

0 

0 

,v 2 = 

1 

0 

, and H = < 


s 

0 


, and V 4 = 



. Find a basis for 


s in M 


. Then every vector in H is 


a linear combination of Vi and V 2 because 




" 1 " 


0 


= .S’ 

0 


1 

0 


0 


0 


Is {Vi, V 2 } a basis for HI 

4. Let V and W be vector spaces, let T : V -> W and U : V -> W be linear transfor¬ 
mations, and let {vi,..., v^} be a basis for V. If T(\j) = U(\j) for every value of 
j between 1 and p , show that T (x) = U (x) for every vector x in V . 


4.3 EXERCISES 


Determine which sets in Exercises 1-8 are bases for R 3 . Of the sets 
that are not bases, determine which ones are linearly independent 
and which ones span R 3 . Justify your answers. 
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1 

O 

1_ 


1 

0 
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1. 

0 

9 

1 

9 

1 

2. 

0 

5 

0 

5 

1 


1 

0 
_1 


1 

0 

_1 


1 


1 


1 

0 
_1 
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0 
_1 



1 


1 

LO 

_1 


-3 


1 

ts) 
_1 


1" 


~-7“ 

3. 

1 

O <N 

_1 

5 

2 

_ —4 _ 

5 

1 

1_ 

4. 

-2 

1 


-3 

2 

5 

5 

4 



1 


1 

to 

_1 


1 

0 

1_ 


1 

0 

1_ 


1 


1 

1_ 

5. 

-3 

5 

9 

5 

0 

5 

-3 

6. 

2 

5 

-5 


1 

0 

_1 


1 

0 

_1 


1 

0 

_1 


1 

1_ 


1 

LO 

1_ 


1 

so 

_1 



1 

CN 

1 _ 


1 

so 

1 _ 


1 


1 

O 

l _ 


1 

LO 

_ 1 


1 

O 

1 _ 

7. 

3 

0 _ 

5 

1 

Ol J- 

1 _ 

8. 

1 

_ l 

9 

l 

uj 

1 _ 


1 

_ 1 

5 

1 

CN CN 

_ 1 


Find bases for the null spaces of the matrices given in Exercises 9 
and 10. Refer to the remarks that follow Example 3 in Section 4.2. 



1 _ 

0 

-3 

to 

_ 1 


1 _ 

0 

-5 

1 

1 

9. 

0 

1 

-5 

4 

10. 

-2 

1 

6 

-2 

-2 


_ 3 

-2 

1 

-2 


1 

0 

2 

00 

1 

so 

1_ 


11. Find a basis for the set of vectors in R 3 in the plane 
x + 2y + z = 0. [Hint: Think of the equation as a “system” 
of homogeneous equations.] 


12. Find a basis for the set of vectors in R 2 on the line y = 5x. 


In Exercises 13 and 14, assume that A is row equivalent to B . Find 
bases for Nul A and Col A. 



~-2 

4 

-2 

-4" 


"1 

0 

6 

5" 

13. A = 

2 

-6 

-3 

1 

,B = 

0 

2 

5 

3 


-3 

8 

2 

-3 


0 

0 

0 

0 
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14. A = 



1 2 

2 4 
1 2 

3 6 

1 2 
0 0 
0 0 
0 0 


-5 11 -3 

-5 15 2 

0 4 5 

-5 19 -2 

0 4 5" 

5-7 8 

0 0-9 

0 0 0 


In Exercises 15-18, find a basis for the space spanned by the given 
vectors, Vi,..., v 5 . 



16. 


1 

0 

0 

1 


17. [M] 


2 

1 

1 


6 

-1 

2 


5 

3 

3 


0 

3 

-1 


1 

-1 

1— 
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oo 

_1 


1 


-1 


os 

_i 


-1 

9 


5 


-4 


OO 


4 

-3 

5 

1 

5 

-9 

5 

4 

5 

11 

-6 


-4 


6 


-7 


OO 

i— 

o 

i_ 


i 

'xf 

_1 


-7 


i— 

o 

i_ 


-7 


18. [M] 


i 

00 

_1 


1 

oo 

1_ 


i 

00 

_1 
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i 

so 

_1 
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-7 


7 


4 


3 

6 

5 

-9 

5 

4 

9 

9 

5 

-4 
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-5 
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-1 

-7 
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-7 


-7 


0 
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1 


7" 


19. Let Vi = 

-3 

7 

, v 2 = 

i 

OS <N 

_1 

, V3 = 

11 

6 

, and H 


Span {vi, v 2 , v 3 }. It can be verified that 4vi + 5v 2 — 3v 3 = 0. 
Use this information to find a basis for H . There is more than 
one answer. 



7" 


1 

'xf- 

1_ 


1 

1_ 

20. Let Vi = 

4 

-9 

, v 2 = 

-7 

2 

, v 3 = 

-5 

3 


i 

cn 

I_ 


i 

cn 

I_ 


i 

'xT 

_1 


. It can be 


verified that Vi — 3v 2 + 5v 3 = 0. Use this information to find 
a basis for H = Span {, v 2 , v 3 }. 


In Exercises 21 and 22, mark each statement True or False. Justify 
each answer. 

21. a. A single vector by itself is linearly dependent. 

b. If H = Span {b A ,..., b p },then {bi,..., b /; } is a basis for 
H. 

c. The columns of an invertible n xn matrix form a basis 
for R n . 

d. A basis is a spanning set that is as large as possible. 


e. In some cases, the linear dependence relations among the 
columns of a matrix can be affected by certain elementary 
row operations on the matrix. 

22. a. A linearly independent set in a subspace H is a basis for 

H. 

b. If a finite set S of nonzero vectors spans a vector space 
V , then some subset of S is a basis for V . 

c. A basis is a linearly independent set that is as large as 
possible. 

d. The standard method for producing a spanning set for 
Nul A, described in Section 4.2, sometimes fails to pro¬ 
duce a basis for Nul A. 

e. If B is an echelon form of a matrix A, then the pivot 
columns of B form a basis for Col A . 

23. Suppose R 4 = Span {vi,..., v 4 }. Explain why {vi,..., v 4 } 

is a basis for R 4 . 


24. Let B = {\i,... ,\ n } be a linearly independent set in R /? . 
Explain why B must be a basis for 




" 1 " 


"0" 


"0" 


25. Let Vi = 

0 

1 

, v 2 = 

1 

1 

, v 3 = 

1 

0 

, and let H be the 


set of vectors in R 3 whose second and third entries are equal. 
Then every vector in H has a unique expansion as a linear 


combination of Vi, v 2 , v 3 , because 


s 


" 1 " 


" 0 " 


" 0 " 

t 

t 

= s 

0 

1 

+ (t — s) 

1 

1 

+ s 

1 

0 


for any s and t. Is {vi,v 2 , v 3 } a basis for HI Why or why 
not? 


26. In the vector space of all real-valued functions, find a basis 
for the subspace spanned by {sin t, sin2 1, sin t cos t}. 

27. Let V be the vector space of functions that describe the 
vibration of a mass-spring system. (Refer to Exercise 19 in 
Section 4.1.) Find a basis for V. 


28. (RLC circuit) The circuit in the figure consists of a resistor 
(R ohms), an inductor (L henrys), a capacitor (C farads), 
and an initial voltage source. Let b = R/(2L), and sup¬ 
pose R, L, and C have been selected so that b also equals 
1 / V LC . (This is done, for instance, when the circuit is used 
in a voltmeter.) Let v(t) be the voltage (in volts) at time t, 
measured across the capacitor. It can be shown that v is in 
the null space H of the linear transformation that maps v(t) 
into Lv"(t) + Rv'(t) + (1 /C)v(t), and H consists of all 
functions of the form v(t) = e~ ht {c\ + c 2 t). Find a basis 
for H. 


R 


Voltage An 
source V 
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Exercises 29 and 30 show that every basis for M," must contain 
exactly n vectors. 

29. Let S = {yi,..., v*} be a set of k vectors in R", with k < n. 
Use a theorem from Section 1.4 to explain why S cannot be 
a basis for R n . 

30. Let S = {vi,..., v^} be a set of k vectors in R", with k > n. 
Use a theorem from Chapter 1 to explain why S cannot be a 
basis for R”. 

Exercises 31 and 32 reveal an important connection between 
linear independence and linear transformations and provide prac¬ 
tice using the definition of linear dependence. Let V and W be 
vector spaces, let T : V -> W be a linear transformation, and let 
{yi,..., v p } be a subset of V . 

31. Show that if {vi,..., v^} is linearly dependent in V, then 
the set of images, {T(vi),..., r(v^)}, is linearly depen¬ 
dent in W. This fact shows that if a linear transforma¬ 
tion maps a set {vi,..., v^} onto a linearly independent set 
{T(vi),..., T (Vp)}, then the original set is linearly indepen¬ 
dent, too (because it cannot be linearly dependent). 

32. Suppose that T is a one-to-one transformation, so that an 
equation T( u) = T(\) always implies u = v. Show that if 
the set of images {T(vi),..., T(v p )} is linearly dependent, 
then {vi,..., v^} is linearly dependent. This fact shows that 
a one-to-one linear transformation maps a linearly indepen¬ 
dent set onto a linearly independent set (because in this case 
the set of images cannot be linearly dependent). 

33. Consider the polynomials p x (f) = 1 + t 2 and p 2 (0 = 1 — 
t 2 . Is {p 1? p 2 } a linearly independent set in P 3 ? Why or why 
not? 

34. Consider the polynomials p j(f) = 1 + t, p 2 (0 = 1 — t, and 
p 3 (t) = 2 (for all t). By inspection, write a linear depen¬ 


dence relation among p l9 p 2 , and p 3 . Then find a basis for 
Span{pj,p 2 ,p 3 }. 


35. Let V be a vector space that contains a linearly indepen¬ 
dent set {ui,u 2 ,u 3 ,u 4 }. Describe how to construct a set of 
vectors {vi, v 2 , v 3 , v 4 } in V such that {vi, v 3 } is a basis for 
Span {vi, v 2 , v 3 , v 4 }. 


36. [M] Let H = Span {ui, u 2 , u 3 } and K = Span {vx, v 2 , y 3 }, 
where 



Find bases for H, K, and H + K. (See Exercises 33 and 34 
in Section 4.1.) 


37. [M] Show that {t, sin t, cos 2 1, sin t cos t} is a linearly inde¬ 
pendent set of functions defined on R. Start by assuming that 

C\ 'l + C2 • sin t + c 3 • cos 2t + c 4 • sin t cos t — 0 (5) 

Equation (5) must hold for all real t, so choose several 
specific values of t (say, t — 0, . 1, .2) until you get a system 
of enough equations to determine that all the cj must be zero. 

38. [M] Show that {1, cos t, cos 2 1,, cos 6 1} is a linearly inde¬ 
pendent set of functions defined on R. Use the method of 
Exercise 37. (This result will be needed in Exercise 34 in 
Section 4.5.) 


WEB 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Let A = [ vi \2 ]. Row operations show that 



1 

-2" 


"1 

1 — 

<N 

A = 

-2 

7 


0 

3 


3 

-9 


0 

0 


Not every row of A contains a pivot position. So the columns of A do not span R 3 , by 
Theorem 4 in Section 1.4. Hence {vi, V 2 } is not a basis for M 3 . Since Vi and V 2 are not 
in M 2 , they cannot possibly be a basis for R 2 . However, since Vi and V 2 are obviously 
linearly independent, they are a basis for a subspace of R 3 , namely, Span {vi , v 2 }. 

2. Set up a matrix A whose column space is the space spanned by {vi, V2, V3, v 4 }, and 
then row reduce A to find its pivot columns. 
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2 

-4" 


"1 

6 

2 

-4" 


"1 

6 
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-4" 

A = 

-3 

2 

-2 

-8 


0 

20 

4 

-20 
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-1 

3 
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-25 

-5 

25 


0 
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The first two columns of A are the pivot columns and hence form a basis of 
Col A = W. Hence {vi, V 2 } is a basis for W. Note that the reduced echelon form 
of A is not needed in order to locate the pivot columns. 

3. Neither Vi nor V 2 is in //, so {vi, V 2 } cannot be a basis for H. In fact, {vi, V 2 } is a 
basis for the plane of all vectors of the form (ci, C 2 , 0), but H is only a line. 

4. Since {vi,..., \ p } is a basis for V , for any vector x in V , there exist scalars c\,... ,c p 
such that x = c\\\ + • • • + c p \ p . Then since T and U are linear transformations 

T(x) = T(c\\\ -\ -f c p \ p ) = ci7\vi) +-f CpT(x p ) 

= Cit/(vi) +-h CpU(x p ) = t/(ciVi H-b CpXp ) 

= U(x) 


4.4 COORDINATE SYSTEMS 


An important reason for specifying a basis B for a vector space V is to impose a 
“coordinate system” on V. This section will show that if B contains n vectors, then 
the coordinate system will make V act like M 77 . If V is already M 77 itself, then B will 
determine a coordinate system that gives a new “view” of V. 

The existence of coordinate systems rests on the following fundamental result. 


THEOREM 7 The Unique Representation Theorem 

Let B = {bi ,..., b 77 } be a basis for a vector space V. Then for each x in V , there 
exists a unique set of scalars C\,... ,c n such that 

X = c ibi H-b c n b n (1) 

PROOF Since B spans V , there exist scalars such that (1) holds. Suppose x also has the 
representation 

x — d\b\ T *** + d n b n 

for scalars d\ ,..., d n . Then, subtracting, we have 

0 = x — x = (c\ - d\)b\ H-b (c n - d n )b n (2) 

Since B is linearly independent, the weights in (2) must all be zero. That is, Cj = dj for 

!<;<«. ■ 


Suppose B = {bi,..., b 77 } is a basis for V and x is in V. The coordinates of x 
relative to the basis B (or the B -coordinates of x) are the weights c \,..., c n such 
that x = cibi + • • • + c n b n . 


If C \,..., c n are the /^-coordinates of x, then the vector in R 


n 





is the coordinate vector of x (relative to B ), or the ^-coordinate vector of x. The 
mapping x i-> [ x ~\ 6 is the coordinate mapping (determined by B)} 


1 The concept of a coordinate mapping assumes that the basis B is an indexed set whose vectors are listed in 
some fixed preassigned order. This property makes the definition of [ x ] B unambiguous. 
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EXAMPLE 1 Consider a basis S = {bi,b 2 } for M 2 , where bi 


b 


1 

2 


. Suppose an x in M 2 has the coordinate vector [ x ] 


B 


2 

3 


1 

0 


. Find x 


and 


SOLUTION The ^-coordinates of x tell how to build x from the vectors in B. That is, 


x = (-2)b! + 3b 2 = (-2) 


1 

0 


+ 3 


1 

2 


1 

6 


EXAMPLE 2 The entries in the vector x = 
the standard basis S = {ei, e 2 }, since 



are the coordinates of x relative to 



= 1 • ei + 6 • e 2 


If £ = {ei, e 2 }, then [x] g 



A Graphical Interpretation of Coordinates 


A coordinate system on a set consists of a one-to-one mapping of the points in the set 
into E 7? . For example, ordinary graph paper provides a coordinate system for the plane 
when one selects perpendicular axes and a unit of measurement on each axis. Figure 1 
shows the standard basis {ei, e 2 }, the vectors bi (= ei) and b 2 from Example 1, and the 


vector x = 



. The coordinates 1 and 6 give the location of x relative to the standard 


basis: 1 unit in the ei direction and 6 units in the e 2 direction. 

Figure 2 shows the vectors bi, b 2 , and x from Figure 1. (Geometrically, the three 
vectors lie on a vertical line in both figures.) However, the standard coordinate grid 
was erased and replaced by a grid especially adapted to the basis B in Example 1. The 


coordinate vector [ x ] 


B 


2 

3 


gives the location of x on this new coordinate system: 


2 units in the bi direction and 3 units in the b 2 direction. 



FIGURE 1 Standard graph FIGURE 2 £>-graph paper, 

paper. 


EXAMPLE 3 In crystallography, the description of a crystal lattice is aided by 
choosing a basis {u, v, w} for M 3 that corresponds to three adjacent edges of one “unit 
cell” of the crystal. An entire lattice is constructed by stacking together many copies of 
one cell. There are fourteen basic types of unit cells; three are displayed in Figure 3 2 


2 Adapted from The Science and Engineering of Materials, 4th Ed., by Donald R. Askeland (Boston: 

Prindle, Weber & Schmidt, ©2002), p. 36. 
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(a) 

Simple 

monoclinic 


(b) 

Body-centered 

cubic 


FIGURE 3 Examples of unit cells. 




Face-centered 

orthorhombic 


The coordinates of atoms within the crystal are given relative to the basis for the 
lattice. For instance, 

" 1 / 2 " 

1/2 

1 

identifies the top face-centered atom in the cell in Figure 3(c). ■ 


Coordinates in R w 

When a basis B for M 7? is fixed, the ^-coordinate vector of a specified x is easily found, 
as in the next example. 


EXAMPLE 4 



coordinate vector [ x ] B of x relative to B. 




{bi, b 2 }. Find the 



FIGURE 4 


The ^-coordinate vector of x is 
(3,2). 


SOLUTION The ^-coordinates C\,C 2 of x satisfy 





This equation can be solved by row operations on an augmented matrix or by using 
the inverse of the matrix on the left. In any case, the solution is c\ — 3, — 2. Thus 

x = 3 bi + 2b2, and 


See Figure 4. 



The matrix in (3) changes the ^-coordinates of a vector x into the standard 
coordinates for x. An analogous change of coordinates can be carried out in M ;? for 
a basis B — {bi,..., b, 7 }. Let 


Pb = [ hi ^2 • • • b ;7 ] 
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THEOREM 8 


Then the vector equation 


X — Cibi + C 2^2 + • • • + C n h n 


is equivalent to 

X = Pb[x] b 



We call Pb the change-of-coordinates matrix from B to the standard basis in R 7/ . 
Left-multiplication by Pb transforms the coordinate vector [ x ] into x. The change-of- 
coordinates equation (4) is important and will be needed at several points in Chapters 5 
and 7. 

Since the columns of Pb form a basis for W l , Pb is invertible (by the Invertible 
Matrix Theorem). Left-multiplication by P B l converts x into its ^-coordinate vector: 






The correspondence x 


-l 


is the coordinate mapping 


[ x ] B , produced here by P B 

mentioned earlier. Since P B l is an invertible matrix, the coordinate mapping is a one- 
to-one linear transformation from W 1 onto W 1 , by the Invertible Matrix Theorem. (See 


also Theorem 12 in Section 1.9.) This property of the coordinate mapping is also true 
in a general vector space that has a basis, as we shall see. 


The Coordinate Mapping 

Choosing a basis B = {bi,..., b„} for a vector space V introduces a coordinate system 
in V. The coordinate mapping x i—> [ x connects the possibly unfamiliar space V to 
the familiar space M /? . See Figure 5. Points in V can now be identified by their new 
“names.” 



Let B = {bi ,..., b n } be a basis for a vector space V . Then the coordinate mapping 
xh^ [ x ] B is a one-to-one linear transformation from V onto W 1 . 


PROOF Take two typical vectors in V , say, 

u = cqbi + • • • + c n b n 
w = d\b\ + • • • + d n b n 


Then, using vector operations, 


u + w — (c i + d i )b i + • • • + (c n + d n )b n 
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Isomorphic Vector 
Spaces 4-11 


It follows that 


[u + w] 


B 


C\ + d\ 


C\ 


d\ 

1 

... + 

_ 1 

— 

• 

• 

• 

_ _ 

+ 

• 

• 

• 

_ _ 


[U] B + [w] 


B 


So the coordinate mapping preserves addition. If r is any scalar, then 


r u = r(c ibi H-h c n b„) = (rci)bi H-h (rc n ) b 


n 


So 


[r u] 


B 


rc\ 

• 

• 


c 1 

• 

• 

1 - 

cj • 
a 

1 _ 


- 1 

• c 

_ 1 


r[ul 


B 


Thus the coordinate mapping also preserves scalar multiplication and hence is a linear 
transformation. See Exercises 23 and 24 for verification that the coordinate mapping is 
one-to-one and maps V onto 



The linearity of the coordinate mapping extends to linear combinations, just as in 
Section 1. 8 . If Ui,..., are in V and if C\ ,..., c p are scalars, then 


[ciUi H-h c p u p ] B 


c i( u i] B +-l-c p [u p ] 


iU “P Jg 


(5) 


In words, (5) says that the ^-coordinate vector of a linear combination of ui,..., is 
the same linear combination of their coordinate vectors. 

The coordinate mapping in Theorem 8 is an important example of an isomorphism 
from V onto P 7? . In general, a one-to-one linear transformation from a vector space V 
onto a vector space W is called an isomorphism from V onto W (iso from the Greek 
for “the same,” and morph from the Greek for “form” or “structure”). The notation and 
terminology for V and W may differ, but the two spaces are indistinguishable as vector 
spaces. Every vector space calculation in V is accurately reproduced in W, and vice 
versa . In particular, any real vector space with a basis of n vectors is indistinguishable 
from P 77 . See Exercises 25 and 26. 


EXAMPLE 5 Let 13 be the standard basis of the space P 3 of polynomials; that is, let 
B = {1, t, t 2 ,t 3 }. A typical element p of P 3 has the form 

r\ o 

p(0 — a® -|“ a\t -|“ $ 2 ^ + a^t 

Since p is already displayed as a linear combination of the standard basis vectors, we 
conclude that 

a 0 

a\ 

a 2 

a 3 _ 

Thus the coordinate mapping p i-> [ p ] B is an isomorphism from P 3 onto P 4 . All vector 
space operations in P 3 correspond to operations in P 4 . ■ 

If we think of P 3 and P 4 as displays on two computer screens that are connected 
via the coordinate mapping, then every vector space operation in P 3 on one screen is 
exactly duplicated by a corresponding vector operation in P 4 on the other screen. The 
vectors on the P 3 screen look different from those on the P 4 screen, but they “act” as 
vectors in exactly the same way. See Figure 6 . 
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FIGURE 6 The space P 3 is isomorphic to M 4 . 


EXAMPLE 6 Use coordinate vectors to verify that the polynomials 1 + 2 t 2 , 
4 + t + 5t 2 , and 3 + 2 1 are linearly dependent in P 2 . 


SOLUTION The coordinate mapping from Example 5 produces the coordinate vectors 
(1,0, 2), (4, 1,5), and (3,2,0), respectively. Writing these vectors as the columns of a 
matrix A , we can determine their independence by row reducing the augmented matrix 
for v4x = 0 : 


1 

4 

3 

0 


"1 

4 

3 

0 

0 

1 

2 

0 


0 

1 

2 

0 

<N 

_ 1 

5 

0 

0 


0 

0 

0 

0 


The columns of A are linearly dependent, so the corresponding polynomials are linearly 
dependent. In fact, it is easy to check that column 3 of A is 2 times column 2 minus 5 
times column 1. The corresponding relation for the polynomials is 

3 + It = 2(4 + t + 5t 2 ) - 5(1 + It 2 ) M 


The final example concerns a plane in M 3 that is isomorphic to 



EXAMPLE 7 Let 



3 


1 

1 _ 


1 

_ 1 

Vi = 

6 

, v 2 = 

0 

, x = 

12 


1 

<N 

_ 1 


1 

_ 1 


— 1 

_ 1 


and B = {vi , V 2 }. Then B is a basis for H = Span {vi , v 2 }. Determine if x is in H , and 
if it is, find the coordinate vector of x relative to B. 

SOLUTION If x is in H, then the following vector equation is consistent: 


3 


"-I" 


3" 

6 

+ C 2 

0 

— 

12 

2 


1 


7 


The scalars C\ and C 2 , if they exist, are the ^-coordinates of x. Using row operations, 
we obtain 


3 

-1 

_ 1 


"1 

0 

1 — 

<N 

6 

0 

12 


0 

1 

3 

<N 

_ 1 

1 

-0 

1 _ 


0 

0 

0 
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Thus C\ = 2, C 2 = 3, and [ x ] 
is shown in Figure 7. 


B 


2 

3 


. The coordinate system on H determined by B 



3v, 


FIGURE 7 A coordinate system on a plane // in M. 


If a different basis for H were chosen, would the associated coordinate system also 
make H isomorphic to M 2 ? Surely, this must be true. We shall prove it in the next section. 


PRACTICE PROBLEMS 



" 1 " 


-3 


3 


-8 

1. Letbi = 

0 

0 

,b 2 = 

4 

0 

,b 3 = 

i 

so CO 

_ 1 

, and x = 

- 1 

<N CO 

_ 1 




a. Show that the set B = {bi, b2, bs} is a basis of 

b. Find the change-of-coordinates matrix from B to the standard basis. 

c. Write the equation that relates x in M 3 to [ x ] B . 

d. Find [ x ] B , for the x given above. 

The set B = {1 + t, 1 + t 2 ,t + t 2 } is a basis for P 2 . Find the coordinate vector of 
p(7) = 6 + 3 1 — t 2 relative to B. 


4.4 EXERCISES 


In Exercises 1-4, find the vector x determined by the given 
coordinate vector [ x ] and the given basis B. 



3. B = 

4. B = 




In Exercises 5-8, find the coordinate vector [ x ] of x relative to 
the given basis B = {bi,..., b n }. 


5. bi = 

1 1 

1_1 

, b 2 = 

2" 

_-5_ 

,x = 

"-2 

1 

6. bi = 

r 

-2 

, b 2 = 

5" 

-6 

,x = 

"4" 

0 



i 

i_ 


1 

LO 

_1 


1 

CN 

1_ 


i 

oo 

1_ 

7. bi = 

i 

UJ i - - 1 

1_ 

, b 2 = 

1 

tJ- o> 

_1 

, b 3 = 

-2 

4_ 

,x = 

1 

so 

_1 



1 


1 

CN 

1 _ 


1 


1 

oo 

_ 1 

8. bi = 

0 

, b 2 = 

1 

, b 3 = 

-1 

,x = 

-5 


i 

LO 

1 _ 


1 

OO 

_ 1 


1 

CN 

_ 1 


1 

'xf 

_ 1 
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In Exercises 9 and 10, find the change-of-coordinates matrix from 
B to the standard basis in R". 


9. B = 




r 

1 

LO 

_1 


1 

to 

_1 


1 

00 

1 _ 

1 

10. B = { 

-1 

5 

0 


-2 

i 

\ 

1 

_ 1 


1 

1 _ 


7_ 

J 


In Exercises 11 and 12, use an inverse matrix to find [ x ] for the 
given x and B. 


11 . B = 

12. B = 



13. The set B = {1 + t 2 , t + t 2 ,1 + 2t + t 2 } is a basis for P 2 . 
Find the coordinate vector of p (t) = 1 + 4 1 + It 2 relative 
to B. 


14. The set B = {1 — t 2 , t — t 2 ,2 — 2t + t 2 } is a basis for P 2 . 
Find the coordinate vector of \)(t) = 3 + t — 6t 2 relative 
to B. 


In Exercises 15 and 16, mark each statement True or False. Justify 
each answer. Unless stated otherwise, B is a basis for a vector 
space V. 


15. a. If x is in V and if B contains n vectors, then the in¬ 
coordinate vector of x is in IR ". 

b. If Pg is the change-of-coordinates matrix, then [x]g = 
P 23 X, for x in V. 

c. The vector spaces P 3 and IR 3 are isomorphic. 


16. a. If # is the standard basis for IR", then the ^-coordinate 

vector of an x in IR" is x itself. 

b. The correspondence [x] i-> x is called the coordinate 

mapping. 

c. In some cases, a plane in R 3 can be isomorphic to P 2 . 


17. The vectors Vi = 



span IR 2 


but do not form a basis. Find two different ways to express 
\ as a linear combination of v 1 , v 2 , v 3 . 


18. Fet B — {bi,..., b /7 } be a basis for a vector space V . Explain 
why the ^-coordinate vectors of bi,..., b n are the columns 
ei,..., e n of the n x n identity matrix. 


19. Fet S be a finite set in a vector space V with the property 
that every x in V has a unique representation as a linear 
combination of elements of S . Show that S is a basis of V. 


20. Suppose {vi,..., v 4 } is a linearly dependent spanning set for 
a vector space V. Show that each w in V can be expressed 
in more than one way as a linear combination of Vi,..., v 4 . 
[Hint: Fet w = k\\\ + • • • + k 4 v 4 be an arbitrary vector in V. 


Use the linear dependence of {vi,..., v 4 } to produce another 
representation of w as a linear combination of Vi,..., v 4 J 


1 

-4 

determined by B is a linear transformation from IR 2 into IR 2 , 
this mapping must be implemented by some 2 x 2 matrix A. 
Find it. [Hint: Multiplication by A should transform a vector 
x into its coordinate vector [ x ] ] 

22. Fet B = {bi,..., b /7 } be a basis for IR" . Produce a description 
of an n x n matrix A that implements the coordinate mapping 
x f-> [ x ] (See Exercise 21.) 

Exercises 23-26 concern a vector space V, a basis B = 
{b 1 ,..., b n }, and the coordinate mapping x [ x ] . 

23. Show that the coordinate mapping is one-to-one. [Hint: Sup¬ 
pose [ u ] = [ w ] for some u and w in V, and show that 

u = w.] 


21 . Fet B = 


-2 

9 


. Since the coordinate mapping 


24. Show that the coordinate mapping is onto IR". That is, given 
any y in IR", with entries y 1 ,..., y n , produce u in V such that 

[u] B = y. 

25. Show that a subset {ui,...,!^} in V is linearly 
independent if and only if the set of coordinate vectors 
{[Ui ] ,..., [ u p ] } is linearly independent in IR". [Hint: 
Since the coordinate mapping is one-to-one, the following 
equations have the same solutions, c\,... ,c p .\ 

C 1 U 1 + • • • + c p u p = 0 he zero vector in V 

[ c 1 U 1 + • • • + CpUp ] = [ 0 ] The zero vector in IR" 

26. Given vectors ui,..., , and w in V , show that w is a linear 

combination of Ui,..., u p if and only if [ w ] is a linear 

combination of the coordinate vectors [ Ui ] ,..., [ ] . 

In Exercises 27-30, use coordinate vectors to test the linear inde¬ 
pendence of the sets of polynomials. Explain your work. 

27. 1 T 2t 3 , 2 T t — 3 1 2 , —t T 2 1~ — t 3 

28. 1 - 2 1 2 - t 3 , t + 2 1 3 , 1 + t - It 2 

29. (1 — t ) 2 , t — It 2 + t 3 , (1 — 0 3 

30. (2 - 0 3 , (3 - t) 2 , 1 + 6t - 5t 2 + t 3 

31. Use coordinate vectors to test whether the following sets of 
polynomials span P 2 . Justify your conclusions. 

a. 1 — 3 1 A- 5t 2 , —3 -\- 5t — It 2 , —4 + 5t — 61 2 ,1 — t 2 

b. 5t T t 2 , 1 — 8 1 — 2 1 2 , —3 + 4 1 T 2 1 2 , 2 — 3 1 



Fet Px(0 = 1 + t 2 , p 2 (0 = t — 3 1 2 , p 3 (0 = 1 + t — 3 1 2 . 

a. Use coordinate vectors to show that these polynomials 
form a basis for P 2 . 

b. Consider the basis B = {p l5 p 2 , p 3 } for P 2 . Find q in P 2 , 


given that [q]# = 
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In Exercises 33 and 34, determine whether the sets of polynomials 
form a basis for P 3 . Justify your conclusions. 


33. [M] 3 + It, 5 + t - 2 1 \ t - 2 1 2 , 1 + 16 1 - 6 1 2 + 2 1 3 

34. [M] 5 - 3t + At 2 + 2t\ 9 + t + 8 r 2 - 6t\ 6-2t + 5t 2 , t 3 

35. [M] Let H = Span {vi, v 2 } and B = {vi, v 2 }. Show that x is 
in H and find the ^-coordinate vector of x, for 



" 11 " 


" 14" 


19" 


-5 


-8 


-13 

Vi = 

10 

,v 2 = 

13 

,x = 

18 


7 


10 


15 


36. [M] Let H = Span {y^ v 2 , v 3 } and B = {vi,v 2 ,v 3 }. Show 
that B is a basis for H and x is in H , and find the ^-coordinate 
vector of x, for 



"-6" 


8 " 


"-9" 


4" 


4 


-3 


5 


7 

Vi = 

-9 

,v 2 = 

7 

, v 3 = 

-8 

,x = 

-8 


4 


-3 


3 


3 


[M] Exercises 37 and 38 concern the crystal lattice for titanium, 
which has the hexagonal structure shown on the left in the ac¬ 


companying figure. The vectors 



form a basis for the unit cell shown on the right. The numbers 
here are Angstrom units (1 A = 10 -8 cm). In alloys of titanium, 


some additional atoms may be in the unit cell at the octahedral 
and tetrahedral sites (so named because of the geometric objects 
formed by atoms at these locations). 



The hexagonal close-packed lattice and its unit cell. 


37. One of the octahedral sites is 


1/2 

1/4 

1/6 


relative to the lattice 


basis. Determine the coordinates of this site relative to the 
standard basis of IR 3 . 


38. One of the tetrahedral sites is 


1/2 

1/2 

1/3 


. Determine the coor¬ 


dinates of this site relative to the standard basis of R 3 . 


SOLUTIONS TO PRACTICE PROBLEMS 


1. a. It is evident that the matrix P& = [ bi b 2 t >3 ] is row-equivalent to the identity 
matrix. By the Invertible Matrix Theorem, P% is invertible and its columns form 
a basis for M 3 . 


b. From part (a), the change-of-coordinates matrix is Ps = 



c. X = Pb[x] b 

d. To solve the equation in (c), it is probably easier to row reduce an augmented 
matrix than to compute P% 1 : 


" 1 

-3 

3 

00 

_ 1 


"1 

0 

0 

Ui 

_1 

0 

4 

-6 

2 


0 

1 

0 

2 

0 

0 

3 

CO 

1_ 


0 

0 

1 

1 


Pb 


X 



I 




Hence 




2. The coordinates of p (t) 


= 6 + 3 1 —t 2 with respect to B satisfy 


c i(1 + 0 + £ 2(1 H”F) — 6 + 3 1 — t 2 
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Equating coefficients of like powers of t , we have 


C\ + C2 —6 

Ci + C 3 = 3 

C2 + C 3 = -1 


Solving, we find that C\ = 5, 62 = 1, e ‘3 = —2, and [ p ] B = 



4.5 THE DIMENSION OF A VECTOR SPACE 


Theorem 8 in Section 4.4 implies that a vector space V with a basis B containing 
n vectors is isomorphic to W 1 . This section shows that this number n is an intrinsic 
property (called the dimension) of the space V that does not depend on the particular 
choice of basis. The discussion of dimension will give additional insight into properties 
of bases. 


The first theorem generalizes a well-known result about the vector space 



n 


THEOREM 9 If a vector space V has a basis B = {bi,..., b„}, then any set in V containing 

more than n vectors must be linearly dependent. 


PROOF Let {ui, • • •, } be a set in V with more than n vectors. The coordinate vectors 

[ Ui ]£,...,[ u p ] B form a linearly dependent set in W 1 , because there are more vectors 
(p) than entries (n) in each vector. So there exist scalars c\, ..., c p , not all zero, such 
that 


C\[ni ] B + -b c p [ \x p ] 


P *■ P is 


0 


0 


The zero vector in R 


n 


Since the coordinate mapping is a linear transformation, 


[c\Ui 4-b CpUp ] 


B 


0 


0 


The zero vector on the right displays the n weights needed to build the vector 
C 1 U 1 + • • • + CpUp from the basis vectors in B. That is, C 1 U 1 + • • • + c p u p = 
0 • bi + • • • + 0 • b n =0. Since the C/ are not all zero, {ui,...,u^} is linearly 
dependent . 1 ■ 


Theorem 9 implies that if a vector space V has a basis B = {bi,..., b„}, then each 
linearly independent set in V has no more than n vectors. 


1 Theorem 9 also applies to infinite sets in V. An infinite set is said to be linearly dependent if some finite 
subset is linearly dependent; otherwise, the set is linearly independent. If S is an infinite set in V, take any 
subset {ui , of S, with p > n. The proof above shows that this subset is linearly dependent, and 

hence so is S. 
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THEOREM 10 If a vector space V has a basis of n vectors, then every basis of V must consist of 

exactly n vectors. 

PROOf Let B\ be a basis of n vectors and 82 be any other basis (of V). Since B\ is 
a basis and 82 is linearly independent, 82 has no more than n vectors, by Theorem 9. 
Also, since 82 is a basis and 8 \ is linearly independent, 82 has at least n vectors. Thus 
82 consists of exactly n vectors. ■ 

If a nonzero vector space V is spanned by a finite set S , then a subset of S is a 
basis for V, by the Spanning Set Theorem. In this case, Theorem 10 ensures that the 
following definition makes sense. 

DEFINITION If V is spanned by a finite set, then V is said to be finite-dimensional, and the 

dimension of V , written as dim V , is the number of vectors in a basis for V . The 
dimension of the zero vector space {0} is defined to be zero. If V is not spanned 
by a finite set, then V is said to be infinite-dimensional. 


EXAMPLE 1 The standard basis for M /? contains n vectors, so dimP 7? = n. The 
standard polynomial basis {1 ,t,t 2 } shows that dim P 2 = 3. In general, dim P„ = n + 1 . 
The space P of all polynomials is infinite-dimensional (Exercise 27). ■ 



EXAMPLE 2 


Let H = Span{vi,v 2 }, where Vi = 


3 


"-I" 

6 

and v 2 = 

0 

2 


1 


. Then 


H is the plane studied in Example 7 in Section 4.4. A basis for H is {vi, v 2 }, since Vi 
and v 2 are not multiples and hence are linearly independent. Thus dim H = 2. ■ 


EXAM PLE 3 Find the dimension of the subspace 





a — 3b + 6 c 
5a + 4 d 
b — 2 c — d 
5d 


: a, b,c,d in R 



SOLUTION It is easy 


to see that H is the set of all linear combinations of the vectors 



1 

1_ 


1 

u> 

_l 


1 

1_ 


1— 
0 

1_ 


5 


0 


0 


4 

Vi = 

0 

, v 2 = 

1 

, V 3 = 

-2 

, v 4 = 

-1 


1 

0 

_ 1 


1 

0 

_ 1 


1 

0 

_ 1 


5 


Clearly, Vi ^ 0, v 2 is not a multiple of Vi , but V 3 is a multiple of v 2 . By the Spanning 
Set Theorem, we may discard V 3 and still have a set that spans H . Finally, V 4 is not a 
linear combination of Vi and v 2 . So {vi, v 2 , V 4 } is linearly independent (by Theorem 4 
in Section 4.3) and hence is a basis for H . Thus dim H — 3. ■ 


EXAM PLE 4 The subspaces of P 3 can be classified by dimension. See Figure 1 . 
0-dimensional subspaces. Only the zero subspace. 

1-dimensional subspaces. Any subspace spanned by a single nonzero vector. Such 
subspaces are lines through the origin. 
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2- dimensional subspaces. Any subspace spanned by two linearly independent 
vectors. Such subspaces are planes through the origin. 

3- dimensional subspaces. Only M 3 itself. Any three linearly independent vectors 

in M 3 span all of M 3 , by the Invertible Matrix Theorem. ■ 



FIGURE 1 Sample subspaces of M 3 . 


Subspaces of a Finite-Dimensional Space 

The next theorem is a natural counterpart to the Spanning Set Theorem. 


THEOREM 11 Let H be a subspace of a finite-dimensional vector space V. Any linearly 

independent set in H can be expanded, if necessary, to a basis for H. Also, H is 
finite-dimensional and 

dim H < dim V 


PROOF If H — {0}, then certainly dim H — 0 < dim V. Otherwise, let S = {ui,..., 
u^} be any linearly independent set in H. If S spans H , then S is a basis for H. 
Otherwise, there is some u^+i in H that is not in Span S. But then {ui,..., U&, u^+i} 
will be linearly independent, because no vector in the set can be a linear combination of 
vectors that precede it (by Theorem 4). 

So long as the new set does not span H , we can continue this process of expanding 
S to a larger linearly independent set in H. But the number of vectors in a linearly 
independent expansion of S can never exceed the dimension of V, by Theorem 9. 
So eventually the expansion of S will span H and hence will be a basis for H , and 
dim H < dim V. ■ 

When the dimension of a vector space or subspace is known, the search for a basis 
is simplified by the next theorem. It says that if a set has the right number of elements, 
then one has only to show either that the set is linearly independent or that it spans the 
space. The theorem is of critical importance in numerous applied problems (involving 
differential equations or difference equations, for example) where linear independence 
is much easier to verify than spanning. 

THEOREM 12 The Basis Theorem 

Let V be a -dimensional vector space, p > 1. Any linearly independent set of 
exactly p elements in V is automatically a basis for V. Any set of exactly p 
elements that spans V is automatically a basis for V. 
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PROOI By Theorem 11, a linearly independent set S of p elements can be extended to 
a basis for V. But that basis must contain exactly p elements, since dim V = p. So S 
must already be a basis for V. Now suppose that S has p elements and spans V. Since 
V is nonzero, the Spanning Set Theorem implies that a subset S' of S is a basis of V. 
Since dim V = p, S' must contain p vectors. Hence S = S '. ■ 


The Dimensions of Nul A and Col A 

Since the pivot columns of a matrix A form a basis for Col A , we know the dimension 
of Col A as soon as we know the pivot columns. The dimension of Nul A might seem to 
require more work, since finding a basis for Nul A usually takes more time than a basis 
for Col A. But there is a shortcut! 

Let A be an m x n matrix, and suppose the equation Ax = 0 has k free variables. 
From Section 4.2, we know that the standard method of finding a spanning set for Nul A 
will produce exactly k linearly independent vectors —say, Ui, ..., u^— one for each 
free variable. So {ui,... ,u^} is a basis for Nul A, and the number of free variables 
determines the size of the basis. Let us summarize these facts for future reference. 


The dimension of Nul A is the number of free variables in the equation Ax = 0, 
and the dimension of Col A is the number of pivot columns in A . 


EXAM PLE 5 Find the dimensions of the null space and the column space of 



1 

3 

8 



SOLUTION Row reduce the augmented matrix [ A 


0 ] to echelon form: 


1 -2 2 3 -1 
0012-2 
0 0 0 0 0 


0 

0 

0 


There are three free variables —X 2 , X 4 , and X 5 . Hence the dimension of Nul A is 3. Also, 
dim Col A = 2 because A has two pivot columns. ■ 


PRACTICE PROBLEMS 

1. Decide whether each statement is True or False, and give a reason for each answer. 
Here V is a nonzero finite-dimensional vector space. 

a. If dim V = p and if S is a linearly dependent subset of V, then S contains more 
than p vectors. 

b. If S spans V and if T is a subset of V that contains more vectors than S, then T 
is linearly dependent. 

2. Let H and K be subspaces of a vector space V. In Section 4.1 Exercise 32 it is 
established that H n K is also a subspace of V. Prove dim (H n K) < dim H . 
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4.5 EXERCISES 


For each subspace in Exercises 1-8, (a) find a basis, and (b) state 
the dimension. 


f 

1 

Co 

to 

_1 

I [ 

4s 

) 

) 

s 4~ t 

: s, t in E > 2. < 

—3s 

: s,t in E [ 

l 

1 

OJ 

1_ 

1 1 

-t 

) 



i_ 

-1 

i 

o 


i_ 

4 

i 

17. A = 

0 

4 

7 

18. A = 

0 

7 

0 


i 

o 

0 

1 _ 


i 

o 

0 

1 

o 


In Exercises 19 and 20, V is a vector space. Mark each statement 
True or False. Justify each answer. 



s 

2 c 

\ 


s 

a 4- b 


3. 


a — b 
b — 3c 

: a,b,c in E 

4. 


2 a 

3a — b 

: a , b in E > 



a + 2b 

j 



—b 

> 


a — 4b — 2c 
2a + 5b — 4c 
—a + 2c 
—3a + lb + 6c 


a, b, c in 







7 


3a 4- 6b — c 
6a — 2b — 2c 
—9a + 5b + 3c 
—3a + b + c 


a,b,c in E 






7 


7. {(a, b,c ) : a — 3b + c = 0, b — 2c = 0, 2b — c = 0} 

8 . {(a, b,c,d ) : a — 3b + c = 0} 


9. Find the dimension of the subspace of all vectors in E 3 whose 
first and third entries are equal. 


19. a. The number of pivot columns of a matrix equals the 

dimension of its column space. 

b. A plane in R 3 is a two-dimensional subspace of E 3 . 

c. The dimension of the vector space P 4 is 4. 

d. If dim V — n and S is a linearly independent set in V, 
then S is a basis for V . 

e. If a set {vi,...^} spans a finite-dimensional vector 
space V and if T is a set of more than p vectors in V, 
then T is linearly dependent. 


20. a. IR 2 is a two-dimensional subspace of IR 3 . 

b. The number of variables in the equation Ax = 0 equals 
the dimension of Nul A. 

c. A vector space is infinite-dimensional if it is spanned by 
an infinite set. 


d. If dim V = n and if S spans V , then S is a basis of V. 

e. The only three-dimensional subspace of IR 3 is E 3 itself 


10. Find the dimension of the subspace H of E 2 spanned by 


-1 

to 

_1 


1_ 


i 

OJ 

_1 

1- 

Lh 

1_ 

5 

1 

O 

_i 


1 

so 

_1 


In Exercises 11 and 12, find the dimension of the subspace 
spanned by the given vectors. 



1 


1 

CO 

_1 


1 

SO 

_ 1 


-1 

11. 

0 

5 

1 

5 

4 

1 

-3 


1 

CN 

_1 


1 


i 

CN 

_1 


1 



1 


1 

LO 

_1 


i 

oo 

1 _ 


1 

_1 

12 . 

-2 

5 

4 

5 

6 

5 

0 


i 

o 

_i 


1 


1 

Ul 

1 _ 


7 


Determine the dimensions of Nul A and Col A for the matrices 
shown in Exercises 13-18. 


13. A = 


1 

0 

0 

0 


-6 9 
1 2 
0 0 
0 0 




14. A = 

15. A = 

16. A = 


1 3 

0 0 
0 0 
0 0 

1 0 9 

0 0 1 

3 4" 

-6 10 


-4 2 

1 -3 

0 1 
0 0 

5" 
-4 


-1 6 
7 0 

4 -3 

0 0 


21. The first four Hermite polynomials are 1, 2t, — 2 + 4^ 2 , and 
— 12 1 + 8 t 3 . These polynomials arise naturally in the study 
of certain important differential equations in mathematical 
physics . 2 Show that the first four Hermite polynomials form 
a basis of P 3 . 

22. The first four Laguerre polynomials are 1,1 — t, 2 — 41 + t 1 , 
and 6 — 18^ + 9t 2 — t 3 . Show that these polynomials form a 
basis of P 3 . 

23. Let B be the basis of P 3 consisting of the Hermite polyno¬ 
mials in Exercise 21, and let p (t) = 7 — 12 1 — 8 1 2 + 12 1 3 . 
Find the coordinate vector of p relative to B. 

24. Let B be the basis of P 2 consisting of the first three 
Laguerre polynomials listed in Exercise 22, and let 
p (0 = 7 — St 4-31 2 . Find the coordinate vector of p relative 
to B. 

25. Let S be a subset of an n -dimensional vector space V, and 
suppose S contains fewer than n vectors. Explain why S 
cannot span V . 

26. Let H be an n -dimensional subspace of an n -dimensional 
vector space V . Show that H = V . 

27. Explain why the space P of all polynomials is an infinite¬ 
dimensional space. 


2 See Introduction to Functional Analysis , 2nd ed., by A. E. Taylor and 
David C. Lay (New York: John Wiley & Sons, 1980), pp. 92-93. Other 
sets of polynomials are discussed there, too. 
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28. Show that the space C (R) of all continuous functions defined 
on the real line is an infinite-dimensional space. 

In Exercises 29 and 30, V is a nonzero finite-dimensional vector 
space, and the vectors listed belong to V. Mark each statement 
True or False. Justify each answer. (These questions are more 
difficult than those in Exercises 19 and 20.) 

29. a. If there exists a set {vi, - - -, v^} that spans V, then 

dim V < p . 

b. If there exists a linearly independent set {vi,..., v p } in 
V , then dim V > p . 

c. If dim V = p , then there exists a spanning set of p + 1 
vectors in V . 

30. a. If there exists a linearly dependent set {vi,..., v p } in V, 

then dim V < p . 

b. If every set of p elements in V fails to span V, then 
dim V > p . 

c. If p > 2 and dim V = p , then every set of p — 1 nonzero 
vectors is linearly independent. 

Exercises 31 and 32 concern finite-dimensional vector spaces V 
and W and a linear transformation T :V —> W. 

31. Let H be a nonzero subspace of T, and let T ( H ) be the set of 
images of vectors in H . Then T(//) is a subspace of IT, by 
Exercise 35 in Section 4.2. Prove that dim T ( H) < dim H. 

32. Let H be a nonzero subspace of V, and suppose T is 
a one-to-one (linear) mapping of V into IT. Prove that 
dim T (H) = dim H. If T happens to be a one-to-one map¬ 
ping of V onto W , then dim V = dim IT. Isomorphic finite¬ 
dimensional vector spaces have the same dimension. 


33. [M] According to Theorem 11, a linearly independent set 
{vi,..., Vfc} in R” can be expanded to a basis for R n . One 
way to do this is to create A = [\ i • • • v* ei • • • e n ], 
with ei,..., e n the columns of the identity matrix; the pivot 
columns of A form a basis for E". 


a. Use the method described to extend the following vectors 
to a basis for R 5 : 



-9 


9" 


i 

i_ 

-7 


4 


7 

OO 

, v 2 = 

1 

, v 3 = 

OO 

-5 


6 


5 

7 _ 


-7 


_ —7 _ 


b. Explain why the method works in general: Why are the 
original vectors Vi,..., v* included in the basis found for 
Col A? Why is Col A = R"? 


34. [M] Let B = {1, cos t, cos 2 1 ,..., cos 6 1} and C = {l,cos t, 
cos 2t ,..., cos 6 ^}. Assume the following trigonometric 
identities (see Exercise 37 in Section 4.1). 


cos 2 1 = — 1 + 2 cos 2 1 

cos 3 1 = —3 cos t + 4 cos 3 1 

cos 4 1 — 1 — 8 cos 2 1 + 8 cos 4 1 

cos 5t = 5 cos t — 20 cos 3 1 + 16 cos 5 1 

cos 6t = — 1 + 18 cos 2 1 — 48 cos 4 1 + 32 cos 6 1 


Let H be the subspace of functions spanned by the functions 
in B. Then B is a basis for H, by Exercise 38 in Section 4.3. 


a. Write the ^-coordinate vectors of the vectors in C, and 
use them to show that C is a linearly independent set in 
H. 

b. Explain why C is a basis for H . 


SOLUTIONS TO PRACTICE PROBLEMS 

1. a. False. Consider the set {0}. 

b. True. By the Spanning Set Theorem, S contains a basis for V ; call that basis S '. 
Then T will contain more vectors than S'. By Theorem 9, T is linearly dependent. 

2. Let {vi,..., v^} be a basis for H D K. Notice {vi,..., v^} is a linearly independent 
subset of H, hence by Theorem 11, {vi,..., v^} can be expanded, if necessary, to 
a basis for H . Since the dimension of a subspace is just the number of vectors in a 
basis, it follows that dim (H fl K) = p < dim H . 


4.6 RANK 


With the aid of vector space concepts, this section takes a look inside a matrix and 
reveals several interesting and useful relationships hidden in its rows and columns. 

For instance, imagine placing 2000 random numbers into a 40 x 50 matrix A and 
then determining both the maximum number of linearly independent columns in A and 
the maximum number of linearly independent columns in A T (rows in A). Remarkably, 


















4.6 Rank 233 


the two numbers are the same. As we’ll soon see, their common value is the rank of the 
matrix. To explain why, we need to examine the subspace spanned by the rows of A. 

The Row Space 

If A is an m x n matrix, each row of A has n entries and thus can be identified with 
a vector in M 77 . The set of all linear combinations of the row vectors is called the row 
space of A and is denoted by Row A. Each row has n entries, so Row A is a subspace 
of M 77 . Since the rows of A are identified with the columns of A T , we could also write 
Col A T in place of Row A. 

EXAMPLE 1 Let 



"-2 

-5 

8 0 

-17" 



= (—2, — 

■5,8,0,-17) 

A _ 

1 

3 

-5 1 

5 

o rl 

r 2 

= (1,3,- 

-5,1,5) 

/i — 

3 

11 

-19 7 

1 

ana 

r 3 

= (3,11, 

-19, 7,1) 


1 

7 

-13 5 

-3 


r 4 

= (1,7,- 

-13,5, -3) 

The row space of A 

is the subspace of M 5 spanned by 

{ r i 

,r 2 ,r 3 ,r 4 

i}. That is, Row A 


Spanjri, r 2 , r 3 , r 4 }. It is natural to write row vectors horizontally; however, they may 
also be written as column vectors if that is more convenient. ■ 

If we knew some linear dependence relations among the rows of matrix A in 
Example 1, we could use the Spanning Set Theorem to shrink the spanning set to a 
basis. Unfortunately, row operations on A will not give us that information, because 
row operations change the row-dependence relations. But row reducing A is certainly 
worthwhile, as the next theorem shows! 

THEOREM 13 If two matrices A and B are row equivalent, then their row spaces are the same. 

If B is in echelon form, the nonzero rows of B form a basis for the row space of 
A as well as for that of B . 


PROOF If B is obtained from A by row operations, the rows of B are linear com¬ 
binations of the rows of A. It follows that any linear combination of the rows of B 
is automatically a linear combination of the rows of A. Thus the row space of B is 
contained in the row space of A . Since row operations are reversible, the same argument 
shows that the row space of A is a subset of the row space of B. So the two row spaces 
are the same. If B is in echelon form, its nonzero rows are linearly independent because 
no nonzero row is a linear combination of the nonzero rows below it. (Apply Theorem 
4 to the nonzero rows of B in reverse order, with the first row last.) Thus the nonzero 
rows of B form a basis of the (common) row space of B and A . ■ 

The main result of this section involves the three spaces: Row A , Col A , and Nul A . 
The following example prepares the way for this result and shows how one sequence of 
row operations on A leads to bases for all three spaces. 

EXAM PLE 2 Find bases for the row space, the column space, and the null space of 
the matrix 

r-2 -5 8 0 -17 

1 3-51 5 

3 11 -19 7 1 

1 7 -13 5 -3 


A 
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SOLUTION To find bases for the row space and the column space, row reduce A to an 
echelon form: 


A ~ B = 


1 

0 

0 

0 




By Theorem 13, the first three rows of B form a basis for the row space of A (as well 
as for the row space of B ). Thus 


Basis for Row A: {(1,3, -5,1,5), (0,1,-2,2, -7), (0,0,0, -4,20)} 


For the column space, observe from B that the pivots are in columns 1,2, and 4. Hence 
columns 1,2, and 4 of A (not B) form a basis for Col A: 


Basis for Col A: 



1 

<N 

1 _ 


1 

Lh 

_ 1 


0 


J 

1 


3 


1 

. 


3 


11 

9 

7 



1 


1 

r- 

_ 1 


1 

Ch 

l _ 

> 


Notice that any echelon form of A provides (in its nonzero rows) a basis for Row A 
and also identifies the pivot columns of A for Col A . However, for Nul A , we need the 
reduced echelon form. Further row operations on B yield 


A ~ B 


~ C 


1 

0 

0 

0 


0 10 1 

1-203 
0 0 1-5 

0 0 0 0 


The equation Ax = 0 is equivalent to Cx = 0, that is, 


X\ T X3 + X5 = 0 

X2 — 2x3 + 3 X 5 = 0 

X4 — 5 x 5 = 0 


So X\ = — X3 — X5, X2 = 2 x 3 — 3 x 5 , %4 = 5x5, with *3 and X5 free variables. The usual 
calculations (discussed in Section 4.2) show that 


Basis for Nul A : 



"-I" 


~-l" 



2 


-3 



1 

9 

0 

> 


0 


5 



0 


1 

> 


Observe that, unlike the basis for Col A , the bases for Row A and Nul A have no simple 
connection with the entries in A itself. 1 ■ 


1 It is possible to find a basis for the row space Row A that uses rows of A. First form A T , and then row 
reduce until the pivot columns of A T are found. These pivot columns of A T are rows of A, and they form 
a basis for the row space of A. 
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WEB 


DEFINITION 


THEOREM 14 


Warning: Although the first three rows of B in Example 2 are linearly independent, 
it is wrong to conclude that the first three rows of A are linearly independent. (In fact, 
the third row of A is 2 times the first row plus 7 times the second row.) Row operations 
may change the linear dependence relations among the rows of a matrix. 


The Rank Theorem 

The next theorem describes fundamental relations among the dimensions of Col A, 
Row A, and Nul A. 


The rank of A is the dimension of the column space of A . 


Since Row A is the same as Col A T , the dimension of the row space of A is the rank 
of A T . The dimension of the null space is sometimes called the nullity of A , though we 
will not use this term. 

An alert reader may have already discovered part or all of the next theorem while 
working the exercises in Section 4.5 or reading Example 2 above. 


The Rank Theorem 

The dimensions of the column space and the row space of an m x n matrix A are 
equal. This common dimension, the rank of A, also equals the number of pivot 
positions in A and satisfies the equation 

rank A + dim Nul A = n 


PROOF By Theorem 6 in Section 4.3, rank A is the number of pivot columns in A. 
Equivalently, rank A is the number of pivot positions in an echelon form B of A. 
Furthermore, since B has a nonzero row for each pivot, and since these rows form a 
basis for the row space of A, the rank of A is also the dimension of the row space. 

From Section 4.5, the dimension of Nul A equals the number of free variables in 
the equation Ax = 0. Expressed another way, the dimension of Nul A is the number of 
columns of A that are not pivot columns. (It is the number of these columns, not the 
columns themselves, that is related to Nul A.) Obviously, 

{ number of ) ( number of ) _ ( number of) 

pivot columns J | nonpivot columns J j columns J 

This proves the theorem. ■ 

The ideas behind Theorem 14 are visible in the calculations in Example 2. The 
three pivot positions in the echelon form B determine the basic variables and identify 
the basis vectors for Col A and those for Row A. 


EXAMPLE 3 

a. If A is a 7 x 9 matrix with a two-dimensional null space, what is the rank of A? 

b. Could a 6 x 9 matrix have a two-dimensional null space? 
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SOLUTION 


a. Since A has 9 columns, (rank A) + 2 = 9, and hence rank A = 7. 

b. No. If a 6 x 9 matrix, call it i?, had a two-dimensional null space, it would have to 

have rank 7, by the Rank Theorem. But the columns of B are vectors in M 6 , and so 
the dimension of Col B cannot exceed 6; that is, rank B cannot exceed 6. ■ 

The next example provides a nice way to visualize the subspaces we have been 
studying. In Chapter 6, we will learn that Row A and Nul A have only the zero vector 
in common and are actually “perpendicular” to each other. The same fact will apply to 
Rowv4 r (= Col A) and Nulv4 r . So Figure 1, which accompanies Example 4, creates 
a good mental image for the general case. (The value of studying A r along with A is 
demonstrated in Exercise 29.) 


EXAMPLE 4 Let A 


3 

3 

4 



. It is readily checked that Nul A is the X 2 - 


axis, Row A is the x 1X3 -plane, Col A is the plane whose equation is x\ — X2 = 0, and 
Nul A t is the set of all multiples of (1, — 1,0). Figure 1 shows Nul A and Row A in the 
domain of the linear transformation x i-> Ax; the range of this mapping, Col A , is shown 
in a separate copy of M 3 , along with Nul A T . ■ 



FIGURE 1 Subspaces determined by a matrix A. 


Applications to Systems of Equations 

The Rank Theorem is a powerful tool for processing information about systems of 
linear equations. The next example simulates the way a real-life problem using linear 
equations might be stated, without explicit mention of linear algebra terms such as 
matrix, subspace, and dimension. 

EXAMPLE 5 A scientist has found two solutions to a homogeneous system of 
40 equations in 42 variables. The two solutions are not multiples, and all other solutions 
can be constructed by adding together appropriate multiples of these two solutions. 
Can the scientist be certain that an associated nonhomogeneous system (with the same 
coefficients) has a solution? 

SOLUTION Yes. Let A be the 40 x 42 coefficient matrix of the system. The given 
information implies that the two solutions are linearly independent and span Nul A. 
So dim Nul A = 2. By the Rank Theorem, dim Col A = 42 — 2 = 40. Since M 40 is the 
only subspace of M 40 whose dimension is 40, Col A must be all of M 40 . This means that 
every nonhomogeneous equation Ax = b has a solution. ■ 
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Rank and the Invertible Matrix Theorem 

The various vector space concepts associated with a matrix provide several more 
statements for the Invertible Matrix Theorem. The new statements listed here follow 
those in the original Invertible Matrix Theorem in Section 2.3. 


THEOREM 


The Invertible Matrix Theorem (continued) 

Let A be an n x n matrix. Then the following statements are each equivalent to 
the statement that A is an invertible matrix. 


m. The columns of A form a basis of R /? 


n 


n. Col A = R 


o. dim Col A — n 

p. rank A = n 

q. Nul v4 = {0} 

r. dimNulA = 0 



Expanded Table for the 
IMT 4-19 


PROOF Statement (m) is logically equivalent to statements (e) and (h) regarding linear 
independence and spanning. The other five statements are linked to the earlier ones of 
the theorem by the following chain of almost trivial implications: 

(g) => (n) => (o) => (p) => (r) => (q) => (d) 

Statement (g), which says that the equation Ax = b has at least one solution for each b in 
W 1 , implies (n), because Col A is precisely the set of all b such that the equation Ax = b 
is consistent. The implications (n) => (o) (p) follow from the definitions of dimension 

and rank. If the rank of A is n , the number of columns of A , then dim Nul A = 0, by the 
Rank Theorem, and so NulA = {0}. Thus (p) =^> (r) => (q). Also, (q) implies that the 
equation Ax = 0 has only the trivial solution, which is statement (d). Since statements 
(d) and (g) are already known to be equivalent to the statement that A is invertible, the 
proof is complete. ■ 


We have refrained from adding to the Invertible Matrix Theorem obvious state¬ 
ments about the row space of A, because the row space is the column space of A T . 
Recall from statement (1) of the Invertible Matrix Theorem that A is invertible if and 
only if A t is invertible. Hence every statement in the Invertible Matrix Theorem can 
also be stated for A T . To do so would double the length of the theorem and produce a 
list of more than 30 statements! 
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WEB 


IN U IVI Lnll/ML IN U I L 


Many algorithms discussed in this text are useful for understanding concepts 
and making simple computations by hand. However, the algorithms are often 
unsuitable for large-scale problems in real life. 

Rank determination is a good example. It would seem easy to reduce a matrix 
to echelon form and count the pivots. But unless exact arithmetic is performed 
on a matrix whose entries are specified exactly, row operations can change the 


apparent rank of a matrix. For instance, if the value of x in the matrix 


5 

5 


7 

x 


is not stored exactly as 7 in a computer, then the rank may be 1 or 2, depending 
on whether the computer treats x — 7 as zero. 

In practical applications, the effective rank of a matrix A is often determined 
from the singular value decomposition of A, to be discussed in Section 7.4. This 
decomposition is also a reliable source of bases for Col A , Row A , Nul A , and 
NuM r . 


PRACTICE PROBLEMS 


The matrices below are row equivalent. 



<N 

1_ 

-1 

1 

-6 

00 

_1 


"1 

-2 

-4 

3 

i — 

<N 

1 

—2 

—4 

3 

-2 

B = 

0 

3 

9 

-12 

12 

-7 

8 

10 

3 

-10 

0 

0 

0 

0 

0 

_1 

-5 

-7 

0 

4 


0 

0 

0 

0 

0 


1. Find rank A and dim Nul A . 

2. Find bases for Col A and Row A . 

3. What is the next step to perform to find a basis for Nul A ? 

4. How many pivot columns are in a row echelon form of A T ? 


4.6 EXERCISES 


In Exercises 1-4, 

assume that the matrix A is row equivalent to B . 


"2-3 


6 


2 

5" 


Without calculations, list rank A and dim Nul A. Then find bases 

3. A = 

-2 3 


■3 


■3 

-4 


for Col A , Row A , and Nul A . 






4 -6 


9 


5 

9 

5 











_— 2 3 


3 


4 

1_ 



1 

-4 


9 -7 





"2 -3 

6 


2 

5" 



1. A = 

-1 

2 


-4 1 


5 



B = 

0 0 

3 


1 

1 




5 

-6 


10 7 





0 0 

0 


1 

3 




"1 

0 


1 5" 






_0 0 

0 


0 

0 



B = 

0 - 

-2 


5 -6 














_0 

0 


0 0_ 






"1 1 

-3 


7 


9 - 

-9 











1 2 

-4 


10 

13 - 

12 


1 

-3 


4 -1 



9" 


4. A = 

1 -1 

-1 


1 


1 - 

-3 

2. A = 

-2 

6 


-6 -1 



10 



1 -3 

1 


-5 


-7 

3 


-3 

9 


-6 -6 


-3 

5 


1 -2 

0 


0 


-5 - 

-4 


3 

-9 


4 9 


0 



“1 1 - 

-3 

7 


9 

-9“ 



“1 - 

-3 

0 

5 - 

-7" 




0 1 - 

-1 

3 


4 

-3 


B = 

0 

0 

2 

-3 

8 



B = 

0 0 

0 

1 


-1 

-2 


0 

0 

0 

0 

5 




0 0 

0 

0 


0 

0 



0 

0 

0 

0 

0 




0 0 

0 

0 


0 

0 
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5. If a 3 x 8 matrix A has rank 3, find dimNul A, dim Row A, 
and rank A T . 

6 . If a 6 x 3 matrix A has rank 3, find dimNul A, dim Row A, 
and rank A T . 

7. Suppose a 4 x 7 matrix A has four pivot columns. Is 
Col A = R 4 ? Is Nul A = R 3 ? Explain your answers. 

8 . Suppose a 5 x 6 matrix A has four pivot columns. What is 
dimNul A? Is Col A = R 4 ? Why or why not? 

9. If the null space of a 5 x 6 matrix A is 4-dimensional, what 
is the dimension of the column space of A? 

10. If the null space of a 7 x 6 matrix A is 5-dimensional, what 
is the dimension of the column space of A? 

11. If the null space of an 8 x 5 matrix A is 2-dimensional, what 
is the dimension of the row space of A? 

12. If the null space of a 5 x 6 matrix A is 4-dimensional, what 
is the dimension of the row space of A? 

13. If A is a 7 x 5 matrix, what is the largest possible rank of A? 
If A is a 5 x 7 matrix, what is the largest possible rank of A? 
Explain your answers. 

14. If A is a 4 x 3 matrix, what is the largest possible dimension 
of the row space of A? If A is a 3 x 4 matrix, what is the 
largest possible dimension of the row space of A? Explain. 

15. If ^4 is a 6 x 8 matrix, what is the smallest possible dimension 
of Nul A? 

16. If A is a 6 x 4 matrix, what is the smallest possible dimension 
of Nul A? 

In Exercises 17 and 18, A is an m x n matrix. Mark each statement 
True or False. Justify each answer. 

17. a. The row space of A is the same as the column space of 

A r . 

b. If B is any echelon form of A, and if B has three nonzero 
rows, then the first three rows of A form a basis for 
Row A. 

c. The dimensions of the row space and the column space 
of A are the same, even if A is not square. 

d. The sum of the dimensions of the row space and the null 
space of A equals the number of rows in A. 

e. On a computer, row operations can change the apparent 
rank of a matrix. 

18. a. If B is any echelon form of A, then the pivot columns of 

B form a basis for the column space of A . 

b. Row operations preserve the linear dependence relations 
among the rows of A . 

c. The dimension of the null space of A is the number of 
columns of A that are not pivot columns. 

d. The row space of A T is the same as the column space of 

A. 


e. If A and B are row equivalent, then their row spaces are 
the same. 

19. Suppose the solutions of a homogeneous system of five linear 
equations in six unknowns are all multiples of one nonzero 
solution. Will the system necessarily have a solution for 
every possible choice of constants on the right sides of the 
equations? Explain. 

20. Suppose a nonhomogeneous system of six linear equations 
in eight unknowns has a solution, with two free variables. Is 
it possible to change some constants on the equations’ right 
sides to make the new system inconsistent? Explain. 

21. Suppose a nonhomogeneous system of nine linear equations 
in ten unknowns has a solution for all possible constants on 
the right sides of the equations. Is it possible to find two 
nonzero solutions of the associated homogeneous system that 
are not multiples of each other? Discuss. 

22. Is it possible that all solutions of a homogeneous system of 
ten linear equations in twelve variables are multiples of one 
fixed nonzero solution? Discuss. 

23. A homogeneous system of twelve linear equations in eight 
unknowns has two fixed solutions that are not multiples of 
each other, and all other solutions are linear combinations of 
these two solutions. Can the set of all solutions be described 
with fewer than twelve homogeneous linear equations? If so, 
how many? Discuss. 

24. Is it possible for a nonhomogeneous system of seven equa¬ 
tions in six unknowns to have a unique solution for some 
right-hand side of constants? Is it possible for such a system 
to have a unique solution for every right-hand side? Explain. 

25. A scientist solves a nonhomogeneous system of ten linear 
equations in twelve unknowns and finds that three of the 
unknowns are free variables. Can the scientist be certain 
that, if the right sides of the equations are changed, the new 
nonhomogeneous system will have a solution? Discuss. 

26. In statistical theory, a common requirement is that a matrix 
be of full rank. That is, the rank should be as large as 
possible. Explain why an m x n matrix with more rows than 
columns has full rank if and only if its columns are linearly 
independent. 

Exercises 27-29 concern an m x n matrix A and what are often 

called the fundamental subspaces determined by A. 

27. Which of the subspaces Row A, Col A, Nul A, RowA r , 
ColA r , and NulA r are in R m and which are in R' 1 ? How 
many distinct subspaces are in this list? 

28. Justify the following equalities: 

a. dim Row A + dimNul A = n Number of columns of A 

b. dim Col A + dimNul A T = m Number of rows of A 

29. Use Exercise 28 to explain why the equation Ax = b has a 
solution for all b in R m if and only if the equation A T x = 0 
has only the trivial solution. 
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30. Suppose A ism x n and b is in IR m . What has to be true about 
the two numbers rank [A b ] and rank A in order for the 
equation Ax = b to be consistent? 


Rank 1 matrices are important in some computer algorithms and 
several theoretical contexts, including the singular value decom¬ 
position in Chapter 7. It can be shown that an m x n matrix A 
has rank 1 if and only if it is an outer product; that is, A = uv r 
for some u in and v in R”. Exercises 31-33 suggest why this 
property is true. 



2 


a 

31. Verify that rankuv r < 1 if u = 

i 

Lh LO 

1 _ 

and v = 

i 

-Q) O 

_ 1 


32. Letu = 



. Find v in IR 3 such that 




33. Let A be any 2x3 matrix such that rank A = 1, let u be the 
first column of A, and suppose u/ 0. Explain why there 
is a vector v in M 3 such that A = uv r . How could this 
construction be modified if the first column of A were zero? 


34. Let A be an m x n matrix of rank r > 0 and let U be an ech¬ 
elon form of A . Explain why there exists an invertible matrix 
E such that A = EU , and use this factorization to write A 
as the sum of r rank 1 matrices. [Hint: See Theorem 10 in 
Section 2.4.] 


35. [M] Let A = 


7 

-9 

-4 

5 

3 

-3 

-7 

-4 

6 

7 

-2 

-6 

-5 

5 

5 

-7 

-6 

5 

-6 

2 

8 

-3 

5 

8 

-1 

-7 

-4 

8 

6 

-8 

-5 

4 

4 

9 

3 


a. Construct matrices C and N whose columns are bases for 
Col A and Nul A, respectively, and construct a matrix R 
whose rows form a basis for Row A. 


b. Construct a matrix M whose columns form a ba¬ 
sis for Nulv4 r , form the matrices S = [ R T N ] and 
T = [C M ], and explain why S and T should be 
square. Verify that both S and T are invertible. 


36. [M] Repeat Exercise 35 for a random integer-valued 6x7 
matrix A whose rank is at most 4. One way to make A 
is to create a random integer-valued 6x4 matrix J and a 
random integer-valued 4x7 matrix K, and set A = JK. 
(See Supplementary Exercise 12 at the end of the chapter; 
and see the Study Guide for matrix-generating programs.) 


37. [M] Let A be the matrix in Exercise 35. Construct a matrix 
C whose columns are the pivot columns of A, and construct 
a matrix R whose rows are the nonzero rows of the reduced 
echelon form of A. Compute CR, and discuss what you see. 


38. [M] Repeat Exercise 37 for three random integer-valued 
5x7 matrices A whose ranks are 5,4, and 3. Make a con¬ 
jecture about how CR is related to A for any matrix A. Prove 
your conjecture. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. A has two pivot columns, so rankH = 2. Since A has 5 columns altogether, 
dim Nul A = 5 — 2 = 3. 

2. The pivot columns of A are the first two columns. So a basis for Col A is 


{ai,a 2 } = 




Major Review of Key 
Concepts 4-22 


The nonzero rows of B form a basis for RowH, namely, {(1, —2, —4, 3, —2), 
(0, 3, 9, —12,12)}. In this particular example, it happens that any two rows of A 
form a basis for the row space, because the row space is two-dimensional and none 
of the rows of A is a multiple of another row. In general, the nonzero rows of an 
echelon form of A should be used as a basis for Row A, not the rows of A itself. 

3. For Nulv4, the next step is to perform row operations on B to obtain the reduced 
echelon form of A . 

4. Rank A r = rankv4, by the Rank Theorem, because Colv4 r = RowH. So A T has 
two pivot positions. 
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4.7 CHANGE OF BASIS 


When a basis B is chosen for an n -dimensional vector space V , the associated coordinate 
mapping onto M /? provides a coordinate system for V . Each x in V is identified uniquely 
by its ^-coordinate vector [ x } B . 1 

In some applications, a problem is described initially using a basis B, but the 
problem’s solution is aided by changing B to a new basis C. (Examples will be given in 
Chapters 5 and 7.) Each vector is assigned a new C-coordinate vector. In this section, 
we study how [ x ] c and [ x ] B are related for each x in V . 

To visualize the problem, consider the two coordinate systems in Figure 1. In 
Figure 1(a), x = 3bi + b 2 , while in Figure 1(b), the same x is shown as x = 6ci + 4c2. 
That is, 





Our problem is to find the connection between the two coordinate vectors. Example 1 
shows how to do this, provided we know how bi and b 2 are formed from and C 2 . 




FIGURE 1 Two coordinate systems for the same vector space. 


EXAMPLE 1 Consider two bases B = {bi,b 2 } and C = {ci, C 2 } for a vector space 
V , such that 

bi = 4c 1 + c 2 and b 2 = ~6ci + c 2 (1) 


Suppose 


X = 3b! + b 2 (2) 

. Find [x] c . 

SOLUTION Apply the coordinate mapping determined by C to x in (2). Since the 
coordinate mapping is a linear transformation, 


That is, suppose [ x ] 


B 


3 

1 


[x] c = [3bi + b 2 ] c 
= 3[bi ] c + [ b 2 ] c 


We can write this vector equation as a matrix equation, using the vectors in the linear 
combination as the columns of a matrix: 




1 Think of [ x as a “name” for x that lists the weights used to build x as a linear combination of the basis 
vectors in B. 
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This formula gives [ x ] c , once we know the columns of the matrix. From (1), 





Thus (3) provides the solution: 



The C-coordinates of x match those of the x in Figure 1. 


The argument used to derive formula (3) can be generalized to yield the following 
result. (See Exercises 15 and 16.) 


THEOREM 15 Let B = {bi, ..., b n } and C = {ci,..., c n } be bases of a vector space V. Then 

there is a unique n x n matrix C ^_ B such that 

[x] c = c ^g[x] B (4) 

The columns of c< l b are the C-coordinate vectors of the vectors in the basis B. 
That is, 

ci-B — [ [bl ]c [t >2 ]c • • • [b n]c ] (5) 


The matrix C ^_ B in Theorem 15 is called the change-of-coordinates matrix from 
13 to C. Multiplication by C ^_ B converts ^-coordinates into C-coordinates. 2 Figure 2 
illustrates the change-of-coordinates equation (4). 





x 




FIGURE 2 Two coordinate systems for V. 


The columns of C ^_ B are linearly independent because they are the coordinate 
vectors of the linearly independent set B. (See Exercise 25 in Section 4.4.) Since C ^_ B 
is square, it must be invertible, by the Invertible Matrix Theorem. Left-multiplying both 
sides of equation (4) by { c ^_ B )~ l yields 



2 To remember how to construct the matrix, think of c£-B I x ]b as a li near combination of the columns of 

C<-& • The matrix-vector product is a C-coordinate vector, so the columns of cl-& should be C-coordinate 
vectors, too. 
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Thus (c£-b) 1 


is the matrix that converts C-coordinates into ^-coordinates. That is, 



Change of Basis in R" 


If B = {bi,..., b„} and £ is the standard basis {ei,..., e 77 } in R 77 , then [bi]^ 



and likewise for the other vectors in B. In this case, S ^_ B is the same as the change-of- 
coordinates matrix P B introduced in Section 4.4, namely, 


Pb = [b i b2 • • • b n ] 


To change coordinates between two nonstandard bases in R 77 , we need Theorem 15. 
The theorem shows that to solve the change-of-basis problem, we need the coordinate 
vectors of the old basis relative to the new basis. 


EXAMPLE 2 Let b 


i 


"-9" 

1 

,b 2 = 

"-5" 

-1 

, Ci = 

r 

-4 

, C 2 = 

3" 

-5 


sider the bases for R 2 given by B = {bi,b 2 } and C 
coordinates matrix from B to C. 


, and con- 


{ci, C 2 }. Find the change-of- 


SOLUTION The matrix C ^_ B involves the C-coordinate vectors of bi and b 2 . Let 


[bi] 


c 


X\ 

and [b 2 ] r = 

~yi' 

_x 2 _ 

L ^ J C 

_3 ; 2_ 


. Then, by definition, 


[Cl C 2 1 


X\ 

x 2 


bi and [ci c 2 ] 


yi 

yi 


b 


To solve both systems simultaneously, augment the coefficient matrix with bi and h 2 , 
and row reduce: 


[ci C 2 ; b 


1 


b 2 ] 


1 

3 ; 

-9 

_ 1 


"1 

0 : 

6 

1 - 

_ 1 

~ 5 

1 

-1 


0 

1 

-5 

1 _ 


(7) 


Thus 


[bil 


c 


6 

5 


and [b 2 ] 


c 


4 

3 


The desired change-of-coordinates matrix is therefore 


P 

C^B 


[[bi] 


c 


[b 2 ] c ] 


6 4 
5 -3 


Observe that the matrix C ^_ B in Example 2 already appeared in (7). This is not 
surprising because the first column of C ^_ B results from row reducing [ Ci C 2 j bi ] to 
[I \ [ bi ] c ], and similarly for the second column of C ^_ B . Thus 



An analogous procedure works for finding the change-of-coordinates matrix between 
any two bases in E 77 . 
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EXAMPLE 3 Let b 


i 


r 

-3 

,b 2 = 

"-2" 

4 

, Ci = 

"-7" 

9 

, c 2 = 

1 1 

<1 Ui 

1 _ 1 


sider the bases for M 2 given by B = {bi, b 2 } and C — { Ci , C 2 }. 

a. Find the change-of-coordinates matrix from C to B. 

b. Find the change-of-coordinates matrix from B to C. 

SOLUTION 

a. Notice that B ^_ c is needed rather than C ^_ B , and compute 


, and con- 


[b 


1 


b 


Ci c 2 ] 


1 

-2 ; 

-7 

_1 


'1 

0 

; 5 

1 

1 

4 

9 

1_ 


0 

1 

6 

-1 


So 


P 

B+-C 


5 

6 


3 

4 


b. By part (a) and property (6) above (with B and C interchanged), 


P 

C^B 


( 


P W 


B<-C 


) 


1 _ 1 

4 

-3" 


2 

- 3 / 2 ' 

2 

-6 

5 


-3 

5/2 _ 


Another description of the change-of-coordinates matrix C ^_ B uses the change-of- 

coordinate matrices Pjs and Pc that convert ^-coordinates and C-coordinates, respec¬ 
tively, into standard coordinates. Recall that for each x in 



Pb[*\ 


B 


x, P c [x]c=x, and [x] c 


Pc 1 * 


Thus 


[xl 


c 


Pc 1 * 


Pc 1 Pb [X] 


c 


B 


In R /? , the change-of-coordinates matrix C ^_ B may be computed as P B l P&. Actually, 
for matrices larger than 2 x 2, an algorithm analogous to the one in Example 3 is faster 
than computing P B l and then P^ { P&. See Exercise 12 in Section 2.2. 


PRACTICE PROBLEMS 



Let T — {fi,f 2 }and^ = 
whose columns are [fi ]g 
by P for all v in V ? 


{gi, g 2 } be bases for a vector space V , and let P be a matrix 
and [ f 2 ]g. Which of the following equations is satisfied 


(i) [v]^.= P[v] a 


(ii) [ v ] g = P[ v ] jr 


2. Let B and C be as in Example 1. Use the results of that example to find the change- 
of-coordinates matrix from C to B. 


4.7 EXERCISES 


1. Let B = {bi, b 2 } and C = {ci, c 2 } be bases for a vector space 
V, and suppose bi = 6c 1 — 2c 2 and b 2 = 9c 1 — 4c 2 . 

a. Find the change-of-coordinates matrix from B to C. 

b. Find [ x ] for x = —3bi + 2b 2 . Use part (a). 


2 . Let B = {bi, b 2 } and C = {ci, c 2 } be bases for a vector space 
V, and suppose bi = —Ci + 4c 2 and b 2 = 5c 1 — 3c 2 . 

a. Find the change-of-coordinates matrix from B to C. 

b. Find [ x ] c for x = 5bi + 3b 2 . 
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3. Let U — {ui, u 2 } and W = {w 1? w 2 } be bases for V, and let 
P be a matrix whose columns are [ui] w and [u 2 ]w- Which 
of the following equations is satisfied by P for all x in VI 

(i)[x] M = P[x] w (ii)[x] w = P[x] M 

4. Let A = {ai,a 2 ,a 3 } and V = {di,d 2 ,d 3 } be bases for V, 

and let P = [ [di]^ [d 2 ]^ [d 3 ]^ ]. Which of the follow¬ 

ing equations is satisfied by P for all x in VI 

(i) [ x ]_4 = P[ x ]'p (ii) [x] D = p[x]^ 

5. Let A = {a 1} a 2 ,a 3 } and # = {bi,b 2 ,b 3 } be bases 

for a vector space V, and suppose a! = 4bi — b 2 , 

a 2 = —b| + b 2 + b 3 , and a 3 = b 2 — 2b 3 . 

a. Find the change-of-coordinates matrix from A to B. 

b. Find [ x } B for x = 3ai + 4a 2 + a 3 . 

6 . Let V = {di, d 2 , d 3 } and T — {f x , f 2 , f 3 } be bases for 

a vector space V, and suppose fi = 2di — d 2 + d 3 , 

f 2 — 3d 2 T d 3 , and f 3 — —3di -j- 2d 3 . 

a. Find the change-of-coordinates matrix from T to V. 

b. Find [ x ] T) for x = fi — 2f 2 + 2f 3 . 

In Exercises 7-10, let B = {b 1? b 2 } and C = {ci, c 2 } be bases for 
M 2 . In each exercise, find the change-of-coordinates matrix from 
Bio C and the change-of-coordinates matrix from C to B. 


7. bi 

8. bi 

9. bi 

10. bi 



In Exercises 11 and 12, B and C are bases for a vector space V. 
Mark each statement True or False. Justify each answer. 


11. a. The columns of the change-of-coordinates matrix C ^_ B 


b 


are ^-coordinate vectors of the vectors in C. 

If V = R n and C is the standard basis for V, then C £_ B 

is the same as the change-of-coordinates matrix Pg intro¬ 
duced in Section 4.4. 


12. a. The columns of C J1 B are linearly independent. 

b. If V = R 2 , B = {bi,b 2 }, and C = {c 1 ? c 2 }, then row 
reduction of [ C\ c 2 bi b 2 ] to [ I P ] produces a 
matrix P that satisfies [ x ] = P [ x ] for all x in V. 


13. In P 2 , find the change-of-coordinates matrix from the basis 
B = {1 — 2t + t 2 , 3 — 5t + At 2 ,21 + 3 1 2 } to the standard 
basis C = (1 ,t,t 2 }. Then find the ^-coordinate vector for 
— 1 T 2 1. 


14. In P 2 , find the change-of-coordinates matrix from the ba¬ 
sis B = (1 — 3 1 2 ,2 + t — 5t 2 ,1 + 2 1} to the standard basis. 
Then write t 2 as a linear combination of the polynomials in B. 


Exercises 15 and 16 provide a proof of Theorem 15. Fill in a 
justification for each step. 

15. Given v in V, there exist scalars X\, ... ,x n , such that 


v — Xibi + x 2 b 2 + • • • + x n b n 


because (a) _. Apply the coordinate mapping deter¬ 

mined by the basis C , and obtain 


Me — x i[bi]c + * 2 [b 2 ]c + • • • + x n [b n \c 

because (b)_. This equation may be written in the form 



by the definition of (c)_. This shows that the matrix 

c £_ b shown in (5) satisfies [v] c = C £_ B [v] B for each v in V, 
because the vector on the right side of ( 8 ) is (d)_. 


16. Suppose Q is any matrix such that 


Me — 2 Mb for each v in V (9) 

Set v = bi in (9). Then (9) shows that [bi]c is the first column 

of Q because (a)_. Similarly,for k = 2,... ,n, the kib 

column of Q is (b)_because (c)_. This shows 

that the matrix C J1 B defined by (5) in Theorem 15 is the only 
matrix that satisfies condition (4). 

17. [M] Let B = {x 0 ,..., x 6 } and C = {y 0 ,..., y 6 }, where x k is 
the function cos A t and y k is the function cos kt. Exercise 34 
in Section 4.5 showed that both B and C are bases for the 
vector space H = Span (x 0 ,..., x 6 }. 

a. Set P = [ [ y 0 ••• [ y 6 ] B ], and calculate P~ l . 

b. Explain why the columns of P~ l are the C-coordinate 
vectors of x 0 ,..., x 6 . Then use these coordinate vectors 
to write trigonometric identities that express powers of 
cos t in terms of the functions in C. 

See the Study Guide. 

18. [M] (Calculus required ) 3 Recall from calculus that integrals 
such as 

/(5 co^-Sco^ + Scos^-IJco^M, (10, 

are tedious to compute. (The usual method is to apply inte¬ 
gration by parts repeatedly and use the half-angle formula.) 
Use the matrix P or P~ l from Exercise 17 to transform (10); 
then compute the integral. 


3 The idea for Exercises 17 and 18 and five related exercises in earlier 
sections came from a paper by Jack W. Rogers, Jr., of Auburn University, 
presented at a meeting of the International Linear Algebra Society, 
August 1995. See “Applications of Linear Algebra in Calculus,” American 
Mathematical Monthly 104 (1), 1997. 
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19. [M] Let 



a. Find a basis {ui,u 2 ,u 3 } for E 3 such that P is the 
change-of-coordinates matrix from {ui,u 2 ,u 3 } to the 

basis {vi,v 2 ,v 3 }. [Hint: What do the columns of C £_ B 

represent?] 


b. Find a basis {wi, w 2 , w 3 } forE 3 such that P is the change- 
of-coordinates matrix from {vi, v 2 , v 3 } to {wi, w 2 , w 3 }. 

20. Let B = {bi,b 2 }, C = {ci,c 2 }, and V = {di,d 2 } be bases 
for a two-dimensional vector space. 

a. Write an equation that relates the matrices C ^_ B , x ,£_ c , 
and - D ^ B . Justify your result. 

b. [M] Use a matrix program either to help you find the 
equation or to check the equation you write. Work with 
three bases for E 2 . (See Exercises 7-10.) 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Since the columns of P are £/-coordinate vectors, a vector of the form Px must be 
a £/-coordinate vector. Thus P satisfies equation (ii). 

2. The coordinate vectors found in Example 1 show that 


Hence 



4.8 APPLICATIONS TO DIFFERENCE EQUATIONS 


Now that powerful computers are widely available, more and more scientific and 
engineering problems are being treated in a way that uses discrete, or digital, data rather 
than continuous data. Difference equations are often the appropriate tool to analyze 
such data. Even when a differential equation is used to model a continuous process, a 
numerical solution is often produced from a related difference equation. 

This section highlights some fundamental properties of linear difference equations 
that are best explained using linear algebra. 


Discrete-Time Signals 

The vector space § of discrete-time signals was introduced in Section 4.1. A signal in 
§ is a function defined only on the integers and is visualized as a sequence of numbers, 
say, {yk}- Figure 1 shows three typical signals whose general terms are (.7)^, 1 A , and 
(—1)^, respectively. 



y k =(-V k 







-2 

0 

2 



FIGURE 1 Three signals in §. 
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Digital signals obviously arise in electrical and control systems engineering, but 
discrete-data sequences are also generated in biology, physics, economics, demography, 
and many other areas, wherever a process is measured, or sampled , at discrete time 
intervals. When a process begins at a specific time, it is sometimes convenient to write 
a signal as a sequence of the form (y 0 , jq, y 2 ,...). The terms y k for k < 0 either are 
assumed to be zero or are simply omitted. 

EXAMPLE 1 The crystal-clear sounds from a compact disc player are produced 
from music that has been sampled at the rate of 44,100 times per second. See Figure 2. 
At each measurement, the amplitude of the music signal is recorded as a number, say, 
y k . The original music is composed of many different sounds of varying frequencies, 
yet the sequence {yk} contains enough information to reproduce all the frequencies 
in the sound up to about 20,000 cycles per second, higher than the human ear can 
sense. ■ 


y 




The Casorati Test 4-30 


Linear Independence in the Space § of Signals 


To simplify notation, we consider a set of only three signals in §, say, {uk}, { v k }, and 
{ w k }. They are linearly independent precisely when the equation 

c\Uk + C 2 Vk + c 3 w k = 0 for all k ( 1 ) 


implies that c\ = C 2 = c 3 = 0. The phrase “for all k” means for all integers—positive, 
negative, and zero. One could also consider signals that start with k = 0, for example, 
in which case, “for all k” would mean for all integers k > 0 . 

Suppose Ci, C 2 , c 3 satisfy (1). Then equation (1) holds for any three consecutive 
values of k , say, k,k + 1, and k + 2. Thus (1) implies that 

ciu k +\ + c 2 v k +i + c 3 w k+ i = 0 for all k 



c\u k+2 + c 2 v k+2 + c 3 w k+2 = 0 for all k 


Hence c \, c 2 , c 3 satisfy 


1 

3 

1_ 


Cl 


0 

u k +\ v k +\ rc^+i 


C 2 

— 

0 

_U k + 2 V k + 2 U) k + 2 _ 


_C 3 _ 


0 


for all k 



The coefficient matrix in this system is called the Casorati matrix of the signals, and 
the determinant of the matrix is called the Casoratian of { u k }, {v k }, and {w k }. If 
the Casorati matrix is invertible for at least one value of k , then (2) will imply that 
Ci = c 2 = C 3 = 0 , which will prove that the three signals are linearly independent. 
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EXAMPLE 2 


Verify that l k , (—2) k , and 3 k 


are linearly independent signals. 




SOLUTION The Casorati matrix is 


\ k 

^k +1 

j ^+2 


( ~2) k 
(~2) k+l 
(~ 2) k+2 




Row operations can show fairly easily that this matrix is always invertible. However, it 
is faster to substitute a value for k — say, k = 0—and row reduce the numerical matrix: 


" 1 

1 

1 


"1 

1 

1 


"1 

1 

r 

1 

-2 

3 


0 

-3 

2 


0 

-3 

2 

1 

4 

^0 

1_ 


0 

3 

00 

1 _ 


0 

0 

10 


The Casorati matrix is invertible for k — 0. So \ k , (—2) k , and 3 k are linearly 
independent. ■ 


If a Casorati matrix is not invertible, the associated signals being tested may or 
may not be linearly dependent. (See Exercise 33.) However, it can be shown that if 
the signals are all solutions of the same homogeneous difference equation (described 
below), then either the Casorati matrix is invertible for all k and the signals are linearly 
independent, or else the Casorati matrix is not invertible for all k and the signals are 
linearly dependent. A nice proof using linear transformations is in the Study Guide. 


Linear Difference Equations 

Given scalars a^,... ,a n , with a 0 and a n nonzero, and given a signal {z^}, the equation 

a^yk+n + aiy k + n -i H-b a n -\yk+\ + a n y k = z k for all k (3) 

is called a linear difference equation (or linear recurrence relation) of order n. For 

simplicity, a$ is often taken equal to 1. If {z k } is the zero sequence, the equation is 
homogeneous; otherwise, the equation is nonhomogeneous. 

EXAMPLE 3 In digital signal processing, a difference equation such as (3) de¬ 
scribes a linear filter, and ao,..., a n are called the filter coefficients. If {y k } is treated 
as the input and {z k } as the output, then the solutions of the associated homogeneous 
equation are the signals that are filtered out and transformed into the zero signal. Let us 
feed two different signals into the filter 

-25y k +2 + .5y^+i + .3 5y k = z k 

Here .35 is an abbreviation for \/2/4. The first signal is created by sampling the 
continuous signal y = cos(tt^/ 4) at integer values of t, as in Figure 3(a). The discrete 
signal is 

{yk} = {• • •, cos(0), cos(tt/ 4), cos(27r/4), cos(3tt/4), ...} 

For simplicity, write =L7 in place of d=\/2/2, so that 

= {..., 1, .7, 0, -.7, -1, -.7, 0, .7, 1, .7, 0,...} 

t 

k = 0 

Table 1 shows a calculation of the output sequence {zk}, where .35(.7) is an abbreviation 
for (\/2/4)(\/2/2) = .25. The output is {yk}, shifted by one term. 
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FIGURE 3 Discrete signals with different frequencies. 


TABLE 1 Computing the Output of a Filter 


k 

yk 

yk+\ 

yk+2 

•3 5y k 


+ -35y^+2 — 

Zk 

0 

l 

.7 

0 

.35(1) 

+ 

•5(.7) 

+ 

.35(0) = 

.7 

1 

.7 

0 

-.7 

.35(.7) 

+ 

.5(0) 

+ 

Lo 

Lh 

• 

ii 

0 

2 

0 

-.7 

-1 

.35(0) 

+ 

•5(—.7) 

+ 

.35(—1) = 

-.7 

3 

-.7 

-1 

-.7 

,35(—.7) 

+ 

•5(—1) 

+ 

Lo 

• 

II 

-1 

4 

-1 

-.7 

0 

.35(—1) 

+ 

•5(—.7) 

+ 

.35(0) = 

-.7 

5 

• 

• 

• 

-.7 

• 

• 

• 

0 

.7 

,35(—.7) 

+ 

•5(0) 

+ 

.35(.7) = 

0 

• 

• 

• 


A different input signal is produced from the higher frequency signal 
cos(37r f/4), shown in Figure 3(b). Sampling at the same rate as before produces a 


y = 

new input sequence: 

{Wk} = 


{...,1, -.7, 0, .7, -1, .7, 0, -.7, 1, -.7, 0,...} 

t 

k — 0 


When {wk} is fed into the filter, the output is the zero sequence. The filter, called a 
low-pass filter , lets {yk} pass through, but stops the higher frequency {wk}. 


In many applications, a sequence {zk} is specified for the right side of a difference 
equation (3), and a {yk} that satisfies (3) is called a solution of the equation. The next 
example shows how to find solutions for a homogeneous equation. 


EXAM PLE 4 Solutions of a homogeneous difference equation often have the form 
= r k for some r. Find some solutions of the equation 


yk 


y k +3 - 2y k +i - 5yk+\ + 6y k = 0 for all k 


( 4 ) 


SOLUTION Substitute r k for yk in the equation and factor the left side 


k-\- 3 


2 r k+2 - 5r 
r k (r 3 — 2 r 2 — 5r + 6) = 0 


k -\-1 


+ 6r 


k 


0 


( 5 ) 


r k (r — l)(r + 2)(r — 3) = 0 


( 6 ) 


Since (5) is equivalent to (6), r k satisfies the difference equation (4) if and only if r 
satisfies (6). Thus 1 A , (—2) k , and 3 k are all solutions of (4). For instance, to verify that 


3 k is a solution of (4), compute 


3 k + 3 _ 2 . 3 k+2 - 5 • 3^ +1 +6-3 


k 


3* (27- 18- 15 + 6) = 0 for all k 
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In general, a nonzero signal r k satisfies the homogeneous difference equation 

yk+n + aiyk+n-i H-f- a n -iy k +i + a„y k = 0 for all k 

if and only if r is a root of the auxiliary equation 

r 11 + a\r n ~ l + • • • + a n ~\r + a n = 0 

We will not consider the case in which r is a repeated root of the auxiliary equation. 
When the auxiliary equation has a complex root , the difference equation has solutions 
of the form s k cos kco and s k sin kco , for constants s and co . This happened in Example 3. 

Solution Sets of Linear Difference Equations 

Given a x ,... ,a n , consider the mapping T : § -> § that transforms a signal {y k } into a 
signal {wk} given by 

w k — yk+n + ti\y k~\~ n —i + • • • + a n - x y k+x + a n y k 

It is readily checked that T is a linear transformation. This implies that the solution set 
of the homogeneous equation 

yk+n + a\yk+n-\ H-h Ctn-iyic+i + a n y k = 0 for all k 

is the kernel of T (the set of signals that T maps into the zero signal), and hence the 
solution set is a subspace of §. Any linear combination of solutions is again a solution. 

The next theorem, a simple but basic result, will lead to more information about the 
solution sets of difference equations. 


THEOREM 16 If a n ^ 0 and if {zk} is given, the equation 

yk+n + aiyk+n-i H-f a„-iy k +i + a„y k = Zk for all k (7) 

has a unique solution whenever yo,... ,y n ~ \ are specified. 


PROOF If yo,... , y n -1 are specified, use (7) to define 

yn = £o — [ a \yn-\ + • • • + a n -\y x + a n yo ] 

And now that y i,..., y n are specified, use (7) to define y n + 1 - In general, use the 
recurrence relation 

yn+k = Zk ~ [ axyk+n-\ + *" + ti n yk ] (8) 

to define y n +k for k > 0. To define y k for k < 0, use the recurrence relation 

11 

yk — —Zk -[ yk+n + a\yk+n-\ + • • • + a n -\yk+\ ] (9) 

a n a ij 

This produces a signal that satisfies (7). Conversely, any signal that satisfies (7) for all 
k certainly satisfies (8) and (9), so the solution of (7) is unique. ■ 


THEOREM 17 The set H of all solutions of the nth-order homogeneous linear difference equation 

yk+n + aiyk+n-i H-1- a„-iy k +i + a„y k = 0 for all k (10) 

is an n -dimensional vector space. 
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PROOF As was pointed out earlier, H is a subspace of § because H is the kernel 
of a linear transformation. For { yk} in H , let F{y{\ be the vector in W 1 given by 
(yo> Ji,..., y n -i)- H is readily verified that F : H -> E /? is a linear transformation. 
Given any vector (yo^Ji, • • • ,y n -i) in R ; \ Theorem 16 says that there is a unique 
signal {yk} in H such that F{yk } = (y 0 , y i,..., y«-i). This means that T 7 is a one- 
to-one linear transformation of H onto R”; that is, T 7 is an isomorphism. Thus 
dim H = dimR ;? = n. (See Exercise 32 in Section 4.5.) ■ 

EXAM PLE 5 Find a basis for the set of all solutions to the difference equation 

yk +3 - 2y k+2 - 5y k+i + 6y k = 0 for all k 

SOLUTION Our work in linear algebra really pays off now! We know from Examples 2 
and 4 that l k , (—2) k , and 3 k are linearly independent solutions. In general, it can be 
difficult to verify directly that a set of signals spans the solution space. But that is no 
problem here because of two key theorems — Theorem 17, which shows that the solution 
space is exactly three-dimensional, and the Basis Theorem in Section 4.5, which says 
that a linearly independent set of n vectors in an n -dimensional space is automatically 
a basis. So l k , (—2 ) k , and 3 k form a basis for the solution space. ■ 

The standard way to describe the “general solution” of the difference equation (10) 
is to exhibit a basis for the subspace of all solutions. Such a basis is usually called a 
fundamental set of solutions of (10). In practice, if you can find n linearly independent 
signals that satisfy (10), they will automatically span the n -dimensional solution space, 
as explained in Example 5. 

Nonhomogeneous Equations 

The general solution of the nonhomogeneous difference equation 

yk+n + a\yk+n-\ H-1- a n -\yk+\ + a n y k = Zk for all k (11) 

can be written as one particular solution of (11) plus an arbitrary linear combination of 
a fundamental set of solutions of the corresponding homogeneous equation (10). This 
fact is analogous to the result in Section 1.5 showing that the solution sets of Ax = b 
and Ax = 0 are parallel. Both results have the same explanation: The mapping x i-> Ax 
is linear, and the mapping that transforms the signal { yk} into the signal {zk} in (11) is 
linear. See Exercise 35. 

EXAMPLE 6 Verify that the signal y k = k 2 satisfies the difference equation 

yk +2 - 4y k +i + 3 y k = -4k for all k (12) 

Then find a description of all solutions of this equation. 

SOLUTION Substitute k 2 for y k on the left side of (12): 

(k + 2) 2 —4{k + l) 2 + 3k 2 

= ( k 2 + 4k + 4)~ 4(k 2 + 2k + 1) + 3k 2 
= -4k 

So k 2 is indeed a solution of (12). The next step is to solve the homogeneous equation 

yk +2 - 4^+1 + 3y k = 0 (13) 

The auxiliary equation is 


r 2 — 4r + 3 = (r — l)(r — 3) = 0 
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FIGURE 4 

Solution sets of difference 
equations (12) and (13). 


The roots are r = 1,3. So two solutions of the homogeneous difference equation are \ k 
and 3 k . They are obviously not multiples of each other, so they are linearly independent 
signals. By Theorem 17, the solution space is two-dimensional, so 3 k and l k form a basis 
for the set of solutions of equation (13). Translating that set by a particular solution of 
the nonhomogeneous equation (12), we obtain the general solution of (12): 

k 2 + c\ l k + C23 k , or k 2 + c\ + c 2 3^ 

Figure 4 gives a geometric visualization of the two solution sets. Each point in the figure 
corresponds to one signal in §. 


Reduction to Systems of First-Order Equations 

A modern way to study a homogeneous nth-order linear difference equation is to replace 
it by an equivalent system of first-order difference equations, written in the form 

X£ + i = Ax k for all k 

where the vectors are in W 1 and A is an n x n matrix. 

A simple example of such a (vector-valued) difference equation was already studied 
in Section 1.10. Further examples will be covered in Sections 4.9 and 5.6. 


EXAM PLE 7 Write the following difference equation as a first-order system 

yk +3 - 2y k +2 - 5y k +\ +6y k = 0 for all k 


SOLUTION For each k, set 


Xjfe 


The difference equation says that y k +3 


y k 

y k +1 

y k+2 


6y k + 5y k +\ + 2y^ +2 ,so 


X£ +1 


y k +1 


y k +2 

— 

_y k +3 _ 



0 + y k +1 + 0 


0 + 0 


+ yk+i 


6yk + 1 + 2yk+2 


0 

0 

6 


1 

0 

5 


0 

1 

2 



y k 


y k +1 


_ y k +2 _ 


That is, 


X£ + i = Axk for all/:, where A 


0 

0 

6 


1 

0 


0 

1 


5 2 


In general, the equation 


yk+n + a\yk+n-\ H-F a n -iy k +\ + a n y k = 0 for all k 


can be rewritten as x^ +i = Ax k for all k , where 


Xjfe 


y k 

y k +1 


yk+n —1 


, A 


0 

0 


1 

0 


0 

1 


0 

0 


0 


a 


0 


0 


n 


CL n — \ Mu—2 


1 

a i 
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PRACTICE PROBLEM 

It can be shown that the signals 2 k , 3 k sin ^, and 3 k cos are solutions of 

yk+3 - 2yk +2 + 9y k +i -18 y k =0 

Show that these signals form a basis for the set of all solutions of the difference equation. 


4.8 EXERCISES 

Verify that the signals in Exercises 1 and 2 are solutions of the 
accompanying difference equation. 

1. 2 k , (-4)*; y k+2 + 2y k+1 - 8y k = 0 

2. 3 k ,(-3) k ; y k+2 -9y k =0 

Show that the signals in Exercises 3-6 form a basis for the solution 
set of the accompanying difference equation. 

3. The signals and equation in Exercise 1 

4. The signals and equation in Exercise 2 

5. (-3) k ,k(-3) k ; y k+2 + 6y k+1 + 9y k = 0 

6 . 5 k cos 4p 5 k sin + 1 ; y k +2 + 2 5y k = 0 

In Exercises 7-12, assume the signals listed are solutions of the 
given difference equation. Determine if the signals form a basis 
for the solution space of the equation. Justify your answers using 
appropriate theorems. 

7. 1*,2*, (—2)*; y k+3 - y k+2 -4y k+l + 4y k = 0 

8 . 2 k ,4 k , (—5) k ; y k + 3 - y k + 2 - 22y k+l + 40 y k = 0 

9. l k , 3 k cos *f,3 k sin ! f; y k+3 - y k+2 + 9y k+3 - 9y k = 0 

10. (-1)*, k{-\) k , 5 k ; y k+3 - 3 y k+2 - 9y k+l - 5y k = 0 

11. (—1)*, 3 k ; y k+3 + y k+2 - 9y k+1 - 9y k = 0 

12 . !*,(-!)*; y k+4 -2y k+2 + y k = 0 


In Exercises 13-16, find a basis for the solution space of the 
difference equation. Prove that the solutions you find span the 
solution set. 

13. y k + 2 ~ yu+ 1 + 53 '*= 0 14. y k+2 - ly k + 1 + 12 y k = 0 
15. 3 ? *+2 2 . 5 y k = 0 16. 16>'* + 2 + Sy k+l - 3 y k = 0 

Exercises 17 and 18 concern a simple model of the national 
economy described by the difference equation 

Yk+i ~ fl(l + b)Y k +1 + abY k = 1 (14) 

Here Yk is the total national income during year k , a is a constant 
less than 1 , called the marginal propensity to consume, and b is 
a positive constant of adjustment that describes how changes in 
consumer spending affect the annual rate of private investment . 1 

17. Find the general solution of equation (14) when a = .9 and 
b = |. What happens to Y k as k increases? [Hint: First find a 
particular solution of the form Y k = T , where T is a constant, 
called the equilibrium level of national income.] 

18. Find the general solution of equation (14) when a = .9 and 
b = .5. 


1 For example, see Discrete Dynamical Systems, by James T. Sandefur 
(Oxford: Clarendon Press, 1990), pp. 267-276. The original 
accelerator-multiplier model is attributed to the economist P. A. 
Samuelson. 
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A lightweight cantilevered beam is supported at N points spaced 
10 ft apart, and a weight of 500 lb is placed at the end of the 
beam, 10 ft from the first support, as in the figure. Let yk be 
the bending moment at the kth support. Then y i = 5000 ft-lb. 
Suppose the beam is rigidly attached at the Ath support and the 
bending moment there is zero. In between, the moments satisfy 
the three-moment equation 

yk +2 + + yk = 0 forfc = 1,2, ...,N -2 (15) 



t 



Bending moments on a cantilevered beam. 




19. Find the general solution of difference equation (15). Justify 
your answer. 

20. Find the particular solution of (15) that satisfies the boundary 
conditions jq = 5000 and y^ = 0. (The answer involves A.) 

21. When a signal is produced from a sequence of measurements 
made on a process (a chemical reaction, a flow of heat 
through a tube, a moving robot arm, etc.), the signal usually 
contains random noise produced by measurement errors. A 
standard method of preprocessing the data to reduce the noise 
is to smooth or filter the data. One simple filter is a moving 
average that replaces each yk by its average with the two 
adjacent values: 

\yk+i + \yk + \yk-i = ik forfc = i,2 ,... 

Suppose a signal j^, for k = 0,..., 14, is 

9, 5, 7, 3, 2, 4, 6, 5, 7, 6, 8, 10, 9, 5, 7 

Use the filter to compute Z\, ..., Zn- Make a broken-line 
graph that superimposes the original signal and the smoothed 
signal. 

22. Let {yk} be the sequence produced by sampling the continu¬ 
ous signal 2 cos ^ + cos ^ at t = 0,1,2 ,..., as shown in 
the figure. The values of yk , beginning with k = 0, are 

3, .7, 0, —.7, —3, —.7, 0, .7, 3, .7, 0, ... 

where .7 is an abbreviation for V2/2. 

a. Compute the output signal {ik} when {yk} is fed into the 
filter in Example 3. 

b. Explain how and why the output in part (a) is related to 
the calculations in Example 3. 


Exercises 23 and 24 refer to a difference equation of the form 

yk+i — ayk — b, for suitable constants a and b. 

23. A loan of $10,000 has an interest rate of 1% per month and a 
monthly payment of $450. The loan is made at month k — 0, 
and the first payment is made one month later, at A; = 1. For 
k = 0 , 1 , 2 ,..., let y k be the unpaid balance of the loan just 
after the kth monthly payment. Thus 

y x = 10,000 +(.01)10,000 - 450 

New Balance Interest Payment 

balance due added 

a. Write a difference equation satisfied by {yk}- 

b. [M] Create a table showing k and the balance yk at month 
k . List the program or the keystrokes you used to create 
the table. 

c. [M] What will k be when the last payment is made? How 
much will the last payment be? How much money did the 
borrower pay in total? 

24. At time k — 0, an initial investment of $1000 is made into a 
savings account that pays 6 % interest per year compounded 
monthly. (The interest rate per month is .005.) Each month 
after the initial investment, an additional $200 is added to 
the account. For k = 0,1,2,..., let yk be the amount in the 
account at time k, just after a deposit has been made. 

a. Write a difference equation satisfied by {yk}- 

b. [M] Create a table showing k and the total amount in the 
savings account at month k, for k = 0 through 60. List 
your program or the keystrokes you used to create the 
table. 

c. [M] How much will be in the account after two years (that 
is, 24 months), four years, and five years? How much of 
the five-year total is interest? 

In Exercises 25-28, show that the given signal is a solution of 

the difference equation. Then find the general solution of that 

difference equation. 

25. y k = k 2 ; y k+2 + ly k +\ - 4y k = 7 + lOfc 

26. y k = 1 + k; y k + 2 - 8 j«:+i + 15 y k = 2 + Sk 

27. y k = 2- 2k; y k+2 - \y k + 1 + 2y k = 2 + 3k 

28. y k = 2k - 4; >>*+2 + \y k +i - y k = 1 + 3fc 







































4.9 Applications to Markov Chains 255 


Write the difference equations in Exercises 29 and 30 as first-order 
systems, x^+i = Ax k , for all k. 

29. y k + 4 - 6 y k +i + 8 y k +2 + 6 y k+i - 9y k = 0 

30. y k+3 - \y k +2 + j~ b yk = 0 

31. Is the following difference equation of order 3? Explain. 

yk +3 + + 6y^-(-i = 0 

32. What is the order of the following difference equation? Ex¬ 
plain your answer. 

yk+ 3 + aiy k +2 + + a 3 y k = 0 

33. Let jfc = k 2 and = 2k|k|. Are the signals {y k } and 

linearly independent? Evaluate the associated Casorati ma¬ 
trix C(k) for k = 0, k = —1, and k — — 2, and discuss your 
results. 

34. Let /, g, and /z be linearly independent functions defined for 
all real numbers, and construct three signals by sampling the 
values of the functions at the integers: 

u k = f(k), v k = g(k), w k = h(k) 


Must the signals be linearly independent in S? Discuss. 

35. Let a and b be nonzero numbers. Show that the mapping T 
defined by T{y k } = {w k }, where 

w k = yk+i + ay k+ 1 + by k 

is a linear transformation from S into §. 

36. Let V be a vector space, and let T : V -> V be a linear trans¬ 
formation. Given z in V , suppose x p in V satisfies T(x p ) = z, 
and let u be any vector in the kernel of T . Show that u + x p 
satisfies the nonhomogeneous equation T(x) = z. 

37. Let S 0 be the vector space of all sequences of the form 
(ko, y i> y 2 , ■ • •)» an h define linear transformations T and D 
from S 0 into S 0 by 

T(yo, yi, yi ,...) = (y l? y 2 , y 3 , ■ ■ ■) 

D(y 0 ,yi,y 2 ,---) = (0,y 0 ,yi,y2,---) 

Show that TD — 1 (the identity transformation on So) and 
yet DT ^ /. 


SOLUTION TO PRACTICE PROBLEM 


Examine the Casorati matrix: 



3 k sin 
3 k+l sin 
3^ +2 sin 


kn 

2 

(k-\-l)jr 

2 

(k+2)n 

2 


3^ cos y~ 

3*+i cos 

3*+ 2 cos 


Set k = 0 and row reduce the matrix to verify that it has three pivot positions and hence 
is invertible: 



"1 

0 

r 


"1 

0 

1" 

C(0) = 

2 

3 

0 


0 

3 

-2 


4 

0 

-9 


0 

0 

-13 


The Casorati matrix is invertible at k = 0, so the signals are linearly independent. 
Since there are three signals, and the solution space H of the difference equation has 
dimension 3 (Theorem 17), the signals form a basis for H, by the Basis Theorem. 


4.9 APPLICATIONS TO MARKOV CHAINS 


The Markov chains described in this section are used as mathematical models of a 
wide variety of situations in biology, business, chemistry, engineering, physics, and 
elsewhere. In each case, the model is used to describe an experiment or measurement 
that is performed many times in the same way, where the outcome of each trial of the 
experiment will be one of several specified possible outcomes, and where the outcome 
of one trial depends only on the immediately preceding trial. 

For example, if the population of a city and its suburbs were measured each year, 
then a vector such as 



.60 

.40 


(1) 
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could indicate that 60% of the population lives in the city and 40% in the suburbs. The 
decimals in x 0 add up to 1 because they account for the entire population of the region. 
Percentages are more convenient for our purposes here than population totals. 

A vector with nonnegative entries that add up to 1 is called a probability vector. A 
stochastic matrix is a square matrix whose columns are probability vectors. A Markov 
chain is a sequence of probability vectors xo, xi, X 2 ,..., together with a stochastic 
matrix P , such that 


xi = Px o, x 2 = Px i, x 3 = Px 2 , ... 

Thus the Markov chain is described by the first-order difference equation 

x k+ x = Px k for k = 0,1,2,... 

When a Markov chain of vectors in W 1 describes a system or a sequence of 
experiments, the entries in x k list, respectively, the probabilities that the system is in 
each of n possible states, or the probabilities that the outcome of the experiment is one 
of n possible outcomes. For this reason, x k is often called a state vector. 


EXAM PLE 1 Section 1.10 examined a model for population movement between a 
city and its suburbs. See Figure 1. The annual migration between these two parts of the 
metropolitan region was governed by the migration matrix M : 



From: 

City Suburbs 

".95 .03" 

.05 .97 


To: 

City 

Suburbs 


That is, each year 5% of the city population moves to the suburbs, and 3% of the 
suburban population moves to the city. The columns of M are probability vectors, so 
M is a stochastic matrix. Suppose the 2014 population of the region is 600,000 in the 
city and 400,000 in the suburbs. Then the initial distribution of the population in the 
region is given by x 0 in (1) above. What is the distribution of the population in 2015? 
In 2016? 




FIGURE 1 Annual percentage migration between city and suburbs. 


SOLUTION In Example 3 of Section 1.10, we saw that after one year, the population 

600,000 


vector 


400,000 


changed to 


'.95 

.03" 

" 600,000" 


"582,000" 

.05 

.97 

400,000 


418,000 
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If we divide both sides of this equation by the total population of 1 million, and use the 
fact that kMx = M(kx ), we find that 


".95 

.03" 


".600" 


".582" 

.05 

.97 


.400 


.418 


The vector x\ = 


.582 

.418 


gives the population distribution in 2015. That is, 58.2% of 


the region lived in the city and 41.8% lived in the suburbs. Similarly, the population 
distribution in 2016 is described by a vector X 2 , where 



'.95 

.03" 


".582" 


".565" 

.05 

.97 


.418 


.435 


EXAMPLE 2 Suppose the voting results of a congressional election at a certain 
voting precinct are represented by a vector x in M 3 : 



% voting Democratic (D) 
% voting Republican (R) 
% voting Libertarian (L) 


Suppose we record the outcome of the congressional election every two years by a vector 
of this type and the outcome of one election depends only on the results of the preceding 
election. Then the sequence of vectors that describe the votes every two years may be a 
Markov chain. As an example of a stochastic matrix P for this chain, we take 



D 

From: 

R 

L 

To: 

.70 

.10 

.30" 

D 

.20 

.80 

.30 

R 

.10 

.10 

.40 

L 


The entries in the first column, labeled D, describe what the persons voting Democratic 
in one election will do in the next election. Here we have supposed that 70% will vote D 
again in the next election, 20% will vote R, and 10% will vote L. Similar interpretations 
hold for the other columns of P . A diagram for this matrix is shown in Figure 2. 


.70 .80 



.40 

FIGURE 2 Voting changes from one election to the 
next. 









































CHAPTER 4 Vector Spaces 


If the “transition” percentages remain constant over many years from one election 
to the next, then the sequence of vectors that give the voting outcomes forms a Markov 
chain. Suppose the outcome of one election is given by 



.55 

.40 

.05 


Determine the likely outcome of the next election and the likely outcome of the election 
after that. 

SOLUTION The outcome of the next election is described by the state vector xi and 
that of the election after that by x 2 , where 



.70 

.10 

.30 


i 

in 

in 

• 

i_ 


.440 

Xi = Px 0 = 

.20 

.80 

.30 


.40 

— 

.445 


.10 

.10 

.40 


.05 


.115 


44% will vote D. 
44.5% will vote R. 
11.5% will vote L. 



".70 

.10 

.30" 


" .440 ' 


".3870" 

x 2 = Px\ = 

.20 

.80 

.30 


.445 

— 

.4785 


.10 

.10 

.40 


.115 


.1345 


38.7% will vote D. 
47.8% will vote R. 
13.5% will vote L. 


To understand why x\ does indeed give the outcome of the next election, suppose 1000 
persons voted in the “first” election, with 550 voting D, 400 voting R, and 50 voting L. 
(See the percentages in x 0 .) In the next election, 70% of the 550 will vote D again, 10% 
of the 400 will switch from R to D, and 30% of the 50 will switch from L to D. Thus 
the total D vote will be 


.70(550) + .10(400) + .30(50) = 385 + 40 + 15 = 440 (2) 

Thus 44% of the vote next time will be for the D candidate. The calculation in (2) is 
essentially the same as that used to compute the first entry in xi. Analogous calculations 
could be made for the other entries in xi, for the entries in x 2 , and so on. ■ 


Predicting the Distant Future 

The most interesting aspect of Markov chains is the study of a chain’s long-term 
behavior. For instance, what can be said in Example 2 about the voting after many 
elections have passed (assuming that the given stochastic matrix continues to describe 
the transition percentages from one election to the next)? Or, what happens to the pop¬ 
ulation distribution in Example 1 “in the long run”? Before answering these questions, 
we turn to a numerical example. 



.5 

.2 

.3 


T 

EXAMPLE 3 Let P = 

.3 

.8 

.3 

andxo = 

0 


.2 

0 

.4 


0 


. Consider a system whose 


state is described by the Markov chain x^+i = Px^, fork = 0,1,... What happens to 
the system as time passes? Compute the state vectors xi,..., xi 5 to find out. 
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SOLUTION 



”.5 .2 .3" 


"l" 


\5" 


Xi = Px 0 = 

.3 .8 .3 


0 

— 

.3 



.2 0 .4 


0 


.2 



”.5 .2 .3" 


".5" 


".37" 

x 2 = Px 1 = 

.3 .8 .3 


.3 

— 

.45 


.2 0 .4 


.2 


.18 



".5 

.2 .3" 


".37" 


".329" 

x 3 = Px 2 = 

.3 

.2 

.8 .3 

0 .4 


.45 

.18 

— 

.525 

.146 


The results of further calculations are shown below, with entries rounded to four or five 
significant figures. 



.3133 


.3064 


.3032 


.3016 

x 4 = 

.5625 

, x 5 = 

.5813 

, x 6 = 

.5906 

, x 7 = 

.5953 


i 

<N 

• 

_1 


.1123 


.1062 


.1031 


.3008 


".3004" 


".3002" 


".3001 " 

x 8 = 

.5977 

, x 9 = 

.5988 

, XlO = 

.5994 

, Xu = 

.5997 


.1016 


.1008 


.1004 


.1002 



.30005 


.30002 


.30001 


.30001 

Xl2 = 

.59985 

.10010 

, X13 = 

.59993 

.10005 

, Xl4 = 

.59996 

.10002 

, X 15 = 

.59998 

.10001 


These vectors seem to be approaching q = 


.3 

.6 

.1 


. The probabilities are hardly changing 


from one value of k to the next. Observe that the following calculation is exact (with no 
rounding error): 



.5 

.2 

.3 


.3 


”.15 + .12 + .03 " 


.30 

P q = 

.3 

.8 

.3 


.6 

— 

.09 + .48 + .03 

— 

.60 


.2 

0 

.4 


.1 


_ .06 + 0 + .04 _ 


.10 



When the system is in state q, there is no change in the system from one measurement 
to the next. ■ 


Steady-State Vectors 

If P is a stochastic matrix, then a steady-state vector (or equilibrium vector) for P is 
a probability vector q such that 

P q = q 

It can be shown that every stochastic matrix has a steady-state vector. In Example 3, q 
is a steady-state vector for P . 


EXAMPLE 4 


The probability vector q = 


.375 

.625 


population migration matrix M in Example 1, because 


is a steady-state vector for the 


".95 

.03" 


".375" 


.05 

i_ 


.625 




M q 


.35625 + .01875 
.01875 + .60625 


q 
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If the total population of the metropolitan region in Example 1 is 1 million, then 
q from Example 4 would correspond to having 375,000 persons in the city and 
625,000 in the suburbs. At the end of one year, the migration out of the city would 
be (.05)(375,000) = 18,750 persons, and the migration into the city from the suburbs 
would be (.03) (625,000) = 18,750 persons. As a result, the population in the city would 
remain the same. Similarly, the suburban population would be stable. 

The next example shows how to find a steady-state vector. 


EXAMPLES Let P 


6 

4 


3 

7 


. Find a steady-state vector for P 


SOLUTION First, solve the equation Px = x. 


Px-x =0 
Px-Ix = 0 
(P - /)x = 0 


Recall from Section 1.4 that 7x = x 


For P as above, 


P-I 


.6 

.3" 


"1 

0" 


"-.4 

.3" 

.4 

.7 


0 

1 


.4 

-.3 


To find all solutions of (P — I)x = 0, row reduce the augmented matrix 


"-.4 

.3 

0 " 


"-.4 

.3 

0 " 


"1 

-3/4 

0 " 

.4 

-.3 

0 


0 

0 

0 


0 

0 

0 


Then x\ = \x 2 and x 2 is free. The general solution is X 2 


3/4 

1 


Next, choose a simple basis for the solution space. One obvious choice is 


3/4 

1 


but a better choice with no fractions is w 


3 

4 


(corresponding to X 2 = 4) 


Finally, find a probability vector in the set of all solutions of Px = x. This process 
is easy, since every solution is a multiple of the solution w above. Divide w by the sum 
of its entries and obtain 

3/7' 


q 


4/7 


As a check, compute 


P q 


'6/10 

3/10" 

'3/7" 


"18/70+ 12/70' 


"30/70" 

4/10 

7/10 

4/7 


12/70 + 28/70 


40/70 


q 


The next theorem shows that what happened in Example 3 is typical of many 
stochastic matrices. We say that a stochastic matrix is regular if some matrix power 
P k contains only strictly positive entries. For P in Example 3, 


P 


37 

45 

18 


.26 

.70 

.04 


.33 

.45 

.22 


Since every entry in P 2 is strictly positive, P is a regular stochastic matrix. 

Also, we say that a sequence of vectors {x^ : k = 1,2,...} converges to a vector 
q as k —>► 00 if the entries in x^ can be made as close as desired to the corresponding 
entries in q by taking k sufficiently large. 
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THEOREM 18 If P is an n x n regular stochastic matrix, then P has a unique steady-state vector 

q. Further, if x 0 is any initial state and x^+i = Pxg for k = 0,1,2,..., then the 
Markov chain {x^ } converges to q as k -> oo. 


This theorem is proved in standard texts on Markov chains. The amazing part of the 
theorem is that the initial state has no effect on the long-term behavior of the Markov 
chain. You will see later (in Section 5.2) why this is true for several stochastic matrices 
studied here. 


EXAMPLE 6 In Example 2, what percentage of the voters are likely to vote for the 
Republican candidate in some election many years from now, assuming that the election 
outcomes form a Markov chain? 


SOLUTION For computations by hand, the wrong approach is to pick some initial 
vector x 0 and compute Xi,..., x^ for some large value of k . You have no way of knowing 
how many vectors to compute, and you cannot be sure of the limiting values of the 
entries in xg . 

The correct approach is to compute the steady-state vector and then appeal to 
Theorem 18. Given P as in Example 2, form P — I by subtracting 1 from each diagonal 
entry in P . Then row reduce the augmented matrix: 


[(P-I) 0] = 


-.3 .1 .3 0 

.2 -.2 .3 0 

.1 .1 -.6 0 


Recall from earlier work with decimals that the arithmetic is simplified by multiplying 
each row by 10. 1 


-3 

1 

3 

0 


"1 

0 

-9/4 

0 

2 

-2 

3 

0 


0 

1 

-15/4 

0 

1 

1 

-6 

0 


0 

0 

0 

0 


The general solution of (P — /)x = 0 is Xi = |x 3 ,X 2 = ^X 3 , and X 3 is free. Choosing 
X 3 = 4, we obtain a basis for the solution space whose entries are integers, and from this 
we easily find the steady-state vector whose entries sum to 1 : 



9" 


9/28" 


".32" 

w = 

15 

, and q = 

15/28 


.54 


4 


4/28 


.14 


The entries in q describe the distribution of votes at an election to be held many years 
from now (assuming the stochastic matrix continues to describe the changes from one 
election to the next). Thus, eventually, about 54% of the vote will be for the Republican 
candidate. ■ 


1 Warning: Don’t multiply only P by 10. Instead, multiply the augmented matrix for equation 
(P -7)x = Oby 10. 
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IN U IVI LnllrML IN U I L 


You may have noticed that if x^+i = Pxk for k = 0,1,..., then 


and, in general, 


x 2 = Px 1 = 

P(P*o) 

= P 2 x 0 , 

Xfc = P k X 0 

for k = 

0,1,... 


To compute a specific vector such as x 3 , fewer arithmetic operations are needed 
to compute xi , X 2 , and x 3 , rather than P 3 and P 3 x o. However, if P is small — say, 
30 x 30 —the machine computation time is insignificant for both methods, and a 
command to compute P 3 xq might be preferred because it requires fewer human 
keystrokes. 


PRACTICE PROBLEMS 



Suppose the residents of a metropolitan region move according to the probabilities 
in the migration matrix M in Example 1 and a resident is chosen “at random.” Then 
a state vector for a certain year may be interpreted as giving the probabilities that the 
person is a city resident or a suburban resident at that time. 



Suppose the person chosen is a city resident now, so that xo = 
likelihood that the person will live in the suburbs next year? 



. What is the 


2 . 

3 . 


b. What is the likelihood that the person will be living in the suburbs in two years? 


Let P 


6 .2 

4 .8 


and q 


3 

7 


. Is q a steady-state vector for P ? 


What percentage of the population in Example 1 will live in the suburbs after many 
years? 


4.9 EXERCISES 

1. A small remote village receives radio broadcasts from two 
radio stations, a news station and a music station. Of the 
listeners who are tuned to the news station, 70% will remain 
listening to the news after the station break that occurs each 
half hour, while 30% will switch to the music station at the 
station break. Of the listeners who are tuned to the music 
station, 60% will switch to the news station at the station 
break, while 40% will remain listening to the music. Suppose 
everyone is listening to the news at 8:15 A.M. 

a. Give the stochastic matrix that describes how the radio 
listeners tend to change stations at each station break. 
Label the rows and columns. 

b. Give the initial state vector. 

c. What percentage of the listeners will be listening to the 
music station at 9:25 A.M. (after the station breaks at 8:30 
and 9:00 A.M.)? 

2 . A laboratory animal may eat any one of three foods each day. 
Laboratory records show that if the animal chooses one food 


on one trial, it will choose the same food on the next trial with 
a probability of 50%, and it will choose the other foods on the 
next trial with equal probabilities of 25%. 

a. What is the stochastic matrix for this situation? 

b. If the animal chooses food #1 on an initial trial, what is 
the probability that it will choose food #2 on the second 
trial after the initial trial? 



3. On any given day, a student is either healthy or ill. Of 
the students who are healthy today, 95% will be healthy 
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tomorrow. Of the students who are ill today, 55% will still 
be ill tomorrow. 


a. What is the stochastic matrix for this situation? 

b. Suppose 20% of the students are ill on Monday. What 
fraction or percentage of the students are likely to be ill 
on Tuesday? On Wednesday? 

c. If a student is well today, what is the probability that he 
or she will be well two days from now? 

4. The weather in Columbus is either good, indifferent, or bad 
on any given day. If the weather is good today, there is 
a 60% chance the weather will be good tomorrow, a 30% 
chance the weather will be indifferent, and a 10% chance the 
weather will be bad. If the weather is indifferent today, it will 
be good tomorrow with probability .40 and indifferent with 
probability .30. Finally, if the weather is bad today, it will 
be good tomorrow with probability .40 and indifferent with 
probability .50. 

a. What is the stochastic matrix for this situation? 

b. Suppose there is a 50% chance of good weather today 
and a 50% chance of indifferent weather. What are the 
chances of bad weather tomorrow? 

c. Suppose the predicted weather for Monday is 40% in¬ 
different weather and 60% bad weather. What are the 
chances for good weather on Wednesday? 


In Exercises 5-8, find the steady-state vector. 







9. Determine if P = 


10. Determine if P = 


.2 1 

.8 0 

1 . 2 ' 

0 .8 


is a regular stochastic matrix. 


is a regular stochastic matrix. 


11. a. Find the steady-state vector for the Markov chain in 

Exercise 1. 

b. At some time late in the day, what fraction of the listeners 
will be listening to the news? 

12. Refer to Exercise 2. Which food will the animal prefer after 
many trials? 

13. a. Find the steady-state vector for the Markov chain in 

Exercise 3. 

b. What is the probability that after many days a specific 
student is ill? Does it matter if that person is ill today? 

14. Refer to Exercise 4. In the long run, how likely is it for the 
weather in Columbus to be good on a given day? 

15. [M] The Demographic Research Unit of the California State 
Department of Finance supplied data for the following mi¬ 
gration matrix, which describes the movement of the United 


States population during 2012. In 2012, about 12.5% of the 
total population lived in California. What percentage of the 
total population would eventually live in California if the 
listed migration probabilities were to remain constant over 
many years? 


From: 

CA Rest of U.S. 

.9871 .0027" 

.0129 .9973 


To: 

California 
Rest of U.S. 


16. [M] In Detroit, Hertz Rent A Car has a fleet of about 
2000 cars. The pattern of rental and return locations is given 
by the fractions in the table below. On a typical day, about 
how many cars will be rented or ready to rent from the 


downtown location? 




Cars Rented from: 



City 

Down¬ 

Metro 


Airport 

town 

Airport 

Returned to: 


".90 

.01 

.09 


City Airport 


.01 

.90 

.01 


Downtown 


.09 

.09 

VO 

o 

1 _ 


Metro Airport 


17. Let P be an n x n stochastic matrix. The following argument 
shows that the equation Px = x has a nontrivial solution. (In 
fact, a steady-state solution exists with nonnegative entries. A 
proof is given in some advanced texts.) Justify each assertion 
below. (Mention a theorem when appropriate.) 

a. If all the other rows of P — I are added to the bottom 
row, the result is a row of zeros. 

b. The rows of P — I are linearly dependent. 

c. The dimension of the row space of P — I is less than n. 

d. P — I has a nontrivial null space. 


18. Show that every 2x2 stochastic matrix has at least one 
steady-state vector. Any such matrix can be written in the 

form P — 1 a i ^ n , where a and B are constants 

a 1 — p J 

between 0 and 1. (There are two linearly independent steady- 
state vectors if a = = 0. Otherwise, there is only one.) 


19. Let S be the 1 x n row matrix with a 1 in each column, 

S = [ 1 1 ••• 1] 

a. Explain why a vector x in is a probability vector if and 
only if its entries are nonnegative and 5x= l.(A 1 x 1 
matrix such as the product Sx is usually written without 
the matrix bracket symbols.) 

b. Let P be an n x n stochastic matrix. Explain why 
SP = 5. 

c. Let P be an n x n stochastic matrix, and let x be a 
probability vector. Show that Px is also a probability 
vector. 

20. Use Exercise 19 to show that if P is an n x n stochastic 

matrix, then so is P 2 . 
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21. [M] Examine powers of a regular stochastic matrix, 

a. Compute P k for k = 2, 3,4, 5, when 



.3355 

.3682 

.3067 

.0389 

.2663 

.2723 

.3277 

.5451 

.1935 

.1502 

.1589 

.2395 

.2047 

.2093 

.2067 

.1765 


Display calculations to four decimal places. What hap¬ 
pens to the columns of P k as k increases? Compute the 
steady-state vector for P . 

b. Compute Q k for k = 10,20,..., 80, when 



.97 

.05 

.10 

0 

.90 

.05 

.03 

.05 

.85 


(Stability for Q k to four decimal places may require 
k = 116 or more.) Compute the steady-state vector for Q . 


Conjecture what might be true for any regular stochastic 
matrix. 

c. Use Theorem 18 to explain what you found in parts (a) 
and (b). 

22. [M] Compare two methods for finding the steady-state vector 
q of a regular stochastic matrix P : (1) computing q as in 
Example 5, or (2) computing P k for some large value of k 
and using one of the columns of P k as an approximation for 
q. [The Study Guide describes a program nulbasis that almost 
automates method (1).] 

Experiment with the largest random stochastic matrices 
your matrix program will allow, and use k = 100 or some 
other large value. For each method, describe the time you 
need to enter the keystrokes and run your program. (Some 
versions of MATLAB have commands flops and tic 
... toe that record the number of floating point operations 
and the total elapsed time MATLAB uses.) Contrast the 
advantages of each method, and state which you prefer. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. a. Since 5% of the city residents will move to the suburbs within one year, there 
is a 5% chance of choosing such a person. Without further knowledge about the 
person, we say that there is a 5% chance the person will move to the suburbs. 
This fact is contained in the second entry of the state vector xi, where 




The likelihood that the person will be living in the suburbs after two years is 
9.6%, because 



.95 

.03" 


'.95" 


".904" 

.05 

.97 


.05 


.096 


2. The steady-state vector satisfies Px = x. Since 



we conclude that q is not the steady-state vector for P . 

3. M in Example 1 is a regular stochastic matrix because its entries are all strictly 
positive. So we may use Theorem 18. We already know the steady-state vector from 
Example 4. Thus the population distribution vectors converge to 


WEB 



.375 

.625 


Eventually 62.5% of the population will live in the suburbs. 


CHAPTER 4 SUPPLEMENTARY EXERCISES 


1. Mark each statement True or False. Justify each answer. 
(If true, cite appropriate facts or theorems. If false, explain 
why or give a counterexample that shows why the statement 
is not true in every case.) In parts (a)-(f), Vi ,...,\ p are 


vectors in a nonzero finite-dimensional vector space V , and 

S = 

a. The set of all linear combinations of Vi,..., \ p is a vector 
space. 
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b. If {vi,... ,\ p - 1 } spans U,then S spans V. 

c. If {vi,..., \ p - 1 } is linearly independent, then so is S . 

d. If S is linearly independent, then S is a basis for V. 

e. If Span S = V, then some subset of S is a basis for V. 

f. If dim V = p and Span S = V , then S cannot be linearly 
dependent. 

g. A plane in K 3 is a two-dimensional subspace. 

h. The nonpivot columns of a matrix are always linearly 
dependent. 

i. Row operations on a matrix A can change the linear 
dependence relations among the rows of A . 

j. Row operations on a matrix can change the null space. 

k. The rank of a matrix equals the number of nonzero rows. 

l. If an m x n matrix A is row equivalent to an echelon ma¬ 
trix U and if U has k nonzero rows, then the dimension 
of the solution space of Ax = 0 is m — k . 

m. If B is obtained from a matrix A by several elementary 
row operations, then rank B = rank A. 

n. The nonzero rows of a matrix A form a basis for Row A . 

o. If matrices A and B have the same reduced echelon form, 
then Row A = Row B . 

p. If H is a subspace of R 3 , then there is a 3 x 3 matrix A 
such that H = Col A . 

q. If A is m x n and rank A = m, then the linear transfor¬ 
mation x Ax is one-to-one. 

r. If A is m x n and the linear transformation x i-> Ax is 
onto, then rank A = m . 

s. A change-of-coordinates matrix is always invertible. 

t. If B {bl , . . . , b/; } and C = {ci,..., c n } are bases for a 

vector space V, then the j th column of the change-of- 

coordinates matrix C £_ B is the coordinate vector [c j] B . 


2. Find a basis for the set of all vectors of the form 


a — 2b + 5c 
2a + 5b — 8 c 
—a — ib + 7c 
3a + b + c 


(Be careful.) 



-2 


1 


b\ 


3. Let Ui = 

1 

VO 

_1 

, U 2 = 

1 

bo 

1 _ 

, b = 

b 2 

_ bi _ 

, and 


W = Span{ui,u 2 }. Find an implicit description of W; that 
is, find a set of one or more homogeneous equations that 
characterize the points of W . [Hint: When is b in W ?] 


4. Explain what is wrong with the following discussion: Let 
f(0 = 3 + t and g(0 = 3t + t 1 , and note that g (t) = tf(t). 
Then {f, g} is linearly dependent because g is a multiple of f. 


proof of the Spanning Set Theorem (Section 4.3) to produce 
a basis for H . (Explain how to select appropriate members 
of S.) 

6 . Suppose Pi , p 2 , P 3 , and p 4 are specific polynomials that span 
a two-dimensional subspace H of P 5 . Describe how one can 
find a basis for H by examining the four polynomials and 
making almost no computations. 

7. What would you have to know about the solution set of a 
homogeneous system of 18 linear equations in 20 variables 
in order to know that every associated nonhomogeneous 
equation has a solution? Discuss. 

8 . Let H be an n -dimensional subspace of an n -dimensional 
vector space V . Explain why H = V . 


9. Let T : R /? 




m be a linear transformation. 


a. What is the dimension of the range of T if T is a one-to- 
one mapping? Explain. 

b. What is the dimension of the kernel of T (see Section 4.2) 
if T maps R n onto R m ? Explain. 

10. Let S be a maximal linearly independent subset of a vector 
space V. That is, S has the property that if a vector not in S 
is adjoined to *S, then the new set will no longer be linearly 
independent. Prove that S must be a basis for V. [Hint: What 
if S were linearly independent but not a basis of V ?] 

11. Let S be a finite minimal spanning set of a vector space V. 
That is, S has the property that if a vector is removed from 
S , then the new set will no longer span V . Prove that S must 
be a basis for V. 


Exercises 12-17 develop properties of rank that are sometimes 
needed in applications. Assume the matrix A is m x n. 

12. Show from parts (a) and (b) that rankAl? cannot exceed the 
rank of A or the rank of B . (In general, the rank of a product of 
matrices cannot exceed the rank of any factor in the product.) 

a. Show that if B is n x p, then rank Al? < rank A. [Hint: 
Explain why every vector in the column space of AB is in 
the column space of A.] 

b. Show that if B is n x p, then rankAR < rank B. [Hint: 
Use part (a) to study rank (AS ) T .\ 

13. Show that if P is an invertible m x m matrix, then 
rank PA = rank A. [Hint: Apply Exercise 12 to PA and 

P _ 1 (PA).] 

14. Show that if Q is invertible, then rank A Q = rank A. [Hint: 
Use Exercise 13 to study rank(Ag) r .] 

15. Let A be an m x n matrix, and let B be an n x p matrix 
such that AB = 0. Show that rank A + rank B < n. [Hint: 
One of the four subspaces NulA, Col A, Nul B, and Col 2? 
is contained in one of the other three subspaces.] 


5. Consider the polynomials p^) = 1 + t, p 2 (0 = 1 — t, 
p 3 (0 = 4, p 4 (0 = t + t 2 , and p 5 (0 = 1 + 2t + t 2 , and 
let H be the subspace of P 5 spanned by the set 
S = {pi, p 2 , p 3 , p 4 , p 5 }. Use the method described in the 


16. If A is an m x n matrix of rank r, then a rank factorization 
of A is an equation of the form A = CR, where C is an 
m x r matrix of rank r and R is an r x n matrix of rank r. 
Such a factorization always exists (Exercise 38 in Section 
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4.6). Given any two mxn matrices A and B, use rank 
factorizations of A and B to prove that 

rank (A + B) < rank A + rank B 

[Hint: Write A + B as the product of two partitioned matri¬ 
ces.] 

17. A submatrix of a matrix A is any matrix that results from 
deleting some (or no) rows and/or columns of A. It can be 
shown that A has rank r if and only if A contains an invertible 
r x r submatrix and no larger square submatrix is invertible. 
Demonstrate part of this statement by explaining (a) why 
an mxn matrix A of rank r has an mxr submatrix A \ of 
rank r, and (b) why A\ has an invertible r xr submatrix A 2 . 

The concept of rank plays an important role in the design of 
engineering control systems, such as the space shuttle system 
mentioned in this chapter’s introductory example. A state-space 
model of a control system includes a difference equation of the 
form 


x*+i = Ax k + Bu k for k = 0,1,... 


( 1 ) 


where A is n xn,B is n x m,{x k } is a sequence of “state vectors” 
in R" that describe the state of the system at discrete times, and 
{u k } is a control , or input, sequence. The pair (A, B) is said to be 

controllable if 


rank [B AB A 2 B 


A n ~ l B ] = n 


( 2 ) 


The matrix that appears in (2) is called the controllability matrix 
for the system. If ( A , B) is controllable, then the system can be 
controlled, or driven from the state 0 to any specified state v (in 
R”) in at most n steps, simply by choosing an appropriate control 
sequence in R w . This fact is illustrated in Exercise 18 for n = 4 


and m = 2. For a further discussion of controllability, see this 
text’s web site (Case Study for Chapter 4). 



18. 


Suppose A is a 4 x 4 matrix and B is a 4 x 2 matrix, and let 
u 0 ,..., u 3 represent a sequence of input vectors in ra2 



a. 


Set x 0 = 0, compute x 1} ..., x 4 from equation (1), and 
write a formula for x 4 involving the controllability matrix 
M appearing in equation (2). {Note: The matrix M is 
constructed as a partitioned matrix. Its overall size here 
is 4x8.) 

b. Suppose (A, B) is controllable and v is any vector in R 4 . 
Explain why there exists a control sequence u 0 ,..., u 3 in 
R 2 such that x 4 = v. 

Determine if the matrix pairs in Exercises 19-22 are controllable. 


19. A = 


20. A = 


21. [M] A = 


22. [M] A = 


.9 

1 

0 " 


"0" 




0 

-.9 

0 

,B = 

1 




0 

0 

• 5 _ 


_ 1 _ 




.8 

-.3 

0 " 


" 1 " 




.2 

.5 

1 

,B = 

1 




0 

0 

-• 5 _ 


0 





0 

1 

0 


0 " 


1 “ 


0 

0 

1 


0 

,B = 

0 


0 

0 

0 


1 

0 


_ —2 

-4.2 -4.8 - 

3.6 _ 


_-l_ 


0 

1 

0 


0 “ 


1 " 


0 

0 

1 


0 

,B = 

0 


0 

0 

0 


1 

0 


-1 

-13 

-12.2 - 

-1.5 


-1 
























Eigenvalues and 
Eigenvectors 


INTRODUCTORY EXAMPLE 

Dynamical Systems and Spotted Owls 



In 1990, the northern spotted owl became the center of 
a nationwide controversy over the use and misuse of the 
majestic forests in the Pacific Northwest. Environmental¬ 
ists convinced the federal government that the owl was 
threatened with extinction if logging continued in the old- 
growth forests (with trees more than 200 years old), where 
the owls prefer to live. The timber industry, anticipating 
the loss of 30,000 to 100,000 jobs as a result of new 
government restrictions on logging, argued that the owl 
should not be classified as a “threatened species” and cited 
a number of published scientific reports to support its case. 1 

Caught in the crossfire of the two lobbying groups, 
mathematical ecologists intensified their drive to under¬ 
stand the population dynamics of the spotted owl. The life 
cycle of a spotted owl divides naturally into three stages: 
juvenile (up to 1 year old), subadult (1 to 2 years), and 
adult (older than 2 years). The owls mate for life during 
the subadult and adult stages, begin to breed as adults, 
and live for up to 20 years. Each owl pair requires about 
1000 hectares (4 square miles) for its own home territory. 
A critical time in the life cycle is when the juveniles leave 
the nest. To survive and become a subadult, a juvenile must 
successfully find a new home range (and usually a mate). 


A first step in studying the population dynamics is to 
model the population at yearly intervals, at times denoted 
by k = 0,1,2,.... Usually, one assumes that there is a 1:1 
ratio of males to females in each life stage and counts only 
the females. The population at year k can be described 
by a vector x k = (J k , s k , a k ), where j k , s k , and a k are the 
numbers of females in the juvenile, subadult, and adult 
stages, respectively. 

Using actual field data from demographic studies, 

R. Lamberson and co-workers considered the following 
stage-matrix model: 2 


jk+l 


0 0 .33 


jk 

Sk +1 

— 

.18 0 0 


Sk 

<dk +1 _ 


0 .71 .94 


_ a k _ 


Here the number of new juvenile females in year k + 1 
is .33 times the number of adult females in year k (based 
on the average birth rate per owl pair). Also, 18% of the 
juveniles survive to become subadults, and 71% of the 
subadults and 94% of the adults survive to be counted as 
adults. 

The stage-matrix model is a difference equation of the 
form X£ + i = Axk . Such an equation is often called a 


1 “The Great Spotted Owl War,” Reader’s Digest , November 1992, 
pp.91-95. 


2 R. H. Lamberson, R. McKelvey, B. R. Noon, and C. Voss, “A Dynamic 
Analysis of the Viability of the Northern Spotted Owl in a Fragmented 
Forest Environment,” Conservation Biology 6 (1992), 505-512. Also, a 
private communication from Professor Lamberson, 1993. 
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dynamical system (or a discrete linear dynamical 

system) because it describes the changes in a system as 
time passes. 

The 18% juvenile survival rate in the Lamberson stage 
matrix is the entry affected most by the amount of old- 
growth forest available. Actually, 60% of the juveniles 
normally survive to leave the nest, but in the Willow 
Creek region of California studied by Lamberson and his 
colleagues, only 30% of the juveniles that left the nest were 
able to find new home ranges. The rest perished during the 
search process. 


A significant reason for the failure of owls to find new 
home ranges is the increasing fragmentation of old-growth 
timber stands due to clear-cutting of scattered areas on 
the old-growth land. When an owl leaves the protective 
canopy of the forest and crosses a clear-cut area, the risk of 
attack by predators increases dramatically. Section 5.6 will 
show that the model described above predicts the eventual 
demise of the spotted owl, but that if 50% of the juveniles 
who survive to leave the nest also find new home ranges, 
then the owl population will thrive. 


WEB 


The goal of this chapter is to dissect the action of a linear transformation xi Ax into el¬ 
ements that are easily visualized. Except for a brief digression in Section 5.4, all matrices 
in the chapter are square. The main applications described here are to discrete dynamical 
systems, including the spotted owls discussed above. However, the basic concepts — 
eigenvectors and eigenvalues — are useful throughout pure and applied mathematics, 
and they appear in settings far more general than we consider here. Eigenvalues are also 
used to study differential equations and continuous dynamical systems, they provide 
critical information in engineering design, and they arise naturally in fields such as 
physics and chemistry. 


5.1 EIGENVECTORS AND EIGENVALUES 


Although a transformation x i-> Ax may move vectors in a variety of directions, it often 
happens that there are special vectors on which the action of A is quite simple. 


EXAMPLE 1 Let A 


3 

1 


2 

0 


u 


1 

1 


, and v 


2 

1 


. The images of u and 


v under multiplication by A are shown in Figure 1. In fact, Ax is just 2v. So A only 
“stretches,” or dilates, v. 



FIGURE 1 Effects of multiplication by A. 


As another example, readers of Section 4.9 will recall that if A is a stochastic matrix, 
then the steady-state vector q for A satisfies the equation Ax = x. That is, Aq = 1 • q. 
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This section studies equations such as 

Ax = 2x or Ax = —4x 

where special vectors are transformed by A into scalar multiples of themselves. 


DEFINITION An eigenvector of an n x n matrix A is a nonzero vector x such that Ax = Ax 

for some scalar A. A scalar A is called an eigenvalue of A if there is a nontrivial 
solution x of Ax = Ax; such an x is called an eigenvector corresponding to A. 1 


It is easy to determine if a given vector is an eigenvector of a matrix. It is also easy 
to decide if a specified scalar is an eigenvalue. 



X 


1 


An = —4u, but Av ^ Av . 


EXAMPLE 2 Let A 

vectors of A? 


1 

5 


6 

2 


u 


6 

5 


, and v 


3 

2 


. Are u and v eigen- 


SOLUTION 


Au 


Ay 


"1 

6" 


6" 


"-24 

5 

2 


-5 


20 

"1 

6" 


3" 


"-9" 

5 

2 


-2 


11 


4 


6 

5 


4u 


7^ A 


3 

2 


Thus u is an eigenvector corresponding to an eigenvalue (—4), but v is not an eigenvector 
of A, because Av is not a multiple of v. 


EXAMPLE 3 Show that 7 is an eigenvalue of matrix A in Example 2, and find the 
corresponding eigenvectors. 


SOLUTION The scalar 7 is an eigenvalue of A if and only if the equation 

Ax = 7x 

has a nontrivial solution. But (1) is equivalent to Ax — 7x = 0, or 

(A - 7I)x = 0 

To solve this homogeneous equation, form the matrix 


( 1 ) 


( 2 ) 


A-71 


-1 

_i 


r- 

i_ 

o' 


i_ 

i— 

so 

1- 

Lh 

2 


i 

o 

<1 
i_ 


5 

Ui 

1_ 


The columns of A — 71 are obviously linearly dependent, so (2) has nontrivial solu¬ 
tions. Thus 7 is an eigenvalue of A. To find the corresponding eigenvectors, use row 
operations: 


so 

1_ 

6 

0" 


"1 

-1 

0" 

5 

-5 

0 


0 

0 

0 


The general solution has the form X 2 
eigenvector corresponding to A = 7. 


1 

1 


. Each vector of this form with X 2 ^ 0 is an 


1 Note that an eigenvector must be nonzero, by definition, but an eigenvalue may be zero. The case in which 
the number 0 is an eigenvalue is discussed after Example 5. 
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Warning: Although row reduction was used in Example 3 to find eigen vectors, it 
cannot be used to find eigen values. An echelon form of a matrix A usually does not 
display the eigenvalues of A . 

The equivalence of equations (1) and (2) obviously holds for any A in place of 
A = 7. Thus A is an eigenvalue of an n x n matrix A if and only if the equation 

(A - A7)x = 0 (3) 

has a nontrivial solution. The set of all solutions of (3) is just the null space of the matrix 
A — XI . So this set is a subspace of M ;? and is called the eigenspace of A corresponding 
to A. The eigenspace consists of the zero vector and all the eigenvectors corresponding 
to A. 

Example 3 shows that for matrix A in Example 2, the eigenspace corresponding to 
A = 7 consists of all multiples of (1, 1), which is the line through (1,1) and the origin. 
From Example 2, you can check that the eigenspace corresponding to A = —4 is the 
line through (6, —5). These eigenspaces are shown in Figure 2, along with eigenvectors 
(1,1) and (3/2, —5/4) and the geometric action of the transformation x i-> Ax on each 
eigenspace. 





i 




4 

-1 

6 

EXAMPLE 4 Let A = 

2 

1 

6 



2 

-1 

8 

the corresponding eigenspace. 



SOLUTION Form 

"4 - 

-1 

6" 

i— 

A -21 = 

2 

1 

6 

— 


2 - 

-1 

8 



. An eigenvalue of A is 2. Find a basis for 


2 

0 

0 


<N 

1 _ 

-1 

i — 

0 

2 

0 

— 

2 

-1 

6 

0 

0 

2 


2 

-1 

6 


and row reduce the augmented matrix for (A — 2/)x = 0: 


<N 

1_ 

-1 

6 

0 


<N 

1_ 

-1 

6 

0 

2 

-1 

6 

0 


0 

0 

0 

0 

<N 

_1 

-1 

6 

0 


0 

0 

0 

0 
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At this point, it is clear that 2 is indeed an eigenvalue of A because the equation 
(A — 2/)x = 0 has free variables. The general solution is 


- 1 

X X 

1 

= x 2 

"1/2" 

1 

+ X3 

-3 

0 

, X2 and X3 free 

_A3 _ 


0 


1 



The eigenspace, shown in Figure 3, is a two-dimensional subspace of M 3 . A basis is 



Multiplication by A 


> 



x l Eigenspace for X = 2 x \ Eigenspace for X = 2 


FIGURE 3 A acts as a dilation on the eigenspace. 


i— NUMERICAL NOTE - 

Example 4 shows a good method for manual computation of eigenvectors in 
simple cases when an eigenvalue is known. Using a matrix program and row 
reduction to find an eigenspace (for a specified eigenvalue) usually works, too, 
but this is not entirely reliable. Roundoff error can lead occasionally to a reduced 
echelon form with the wrong number of pivots. The best computer programs 
compute approximations for eigenvalues and eigenvectors simultaneously, to 
any desired degree of accuracy, for matrices that are not too large. The size 
of matrices that can be analyzed increases each year as computing power and 
software improve. 


The following theorem describes one of the few special cases in which eigenvalues 
can be found precisely. Calculation of eigenvalues will also be discussed in Section 5.2. 


THEOREM 1 


The eigenvalues of a triangular matrix are the entries on its main diagonal. 


PROOI For simplicity, consider the 3x3 case. If A is upper triangular, then A — XI 
has the form 



a 11 

<212 

<2l3 


"A 

0 

0 

A - XI = 

0 

<222 

<223 

—— 

0 

A 

0 


0 

0 

<233 _ 


0 

0 

A 


a n - 

•A 

<212 


< 2 13 



— 

0 


<222 — A 

<223 




0 


0 

<233 — 

A 
































CHAPTER 5 


Eigenvalues and Eigenvectors 


The scalar A is an eigenvalue of A if and only if the equation (A — A/)x = 0 has a 
nontrivial solution, that is, if and only if the equation has a free variable. Because of the 
zero entries in A — XI , it is easy to see that ( A — A/)x = 0 has a free variable if and 
only if at least one of the entries on the diagonal of A — XI is zero. This happens if and 
only if A equals one of the entries an, <222, <233 in A. For the case in which A is lower 
triangular, see Exercise 28. ■ 


EXAMPLE 5 


Let A = 


■3 

6 

-8 


4 

0 

0 


0 

0 

6 

and B = 

-2 

1 

0 

. The eigenval 

0 

0 

2 


5 

3 

4 



ues of A are 3,0, and 2. The eigenvalues of B are 4 and 1. ■ 

What does it mean for a matrix A to have an eigenvalue of 0, such as in Example 5 ? 
This happens if and only if the equation 

v4x = Ox (4) 


has a nontrivial solution. But (4) is equivalent to v4x = 0, which has a nontrivial solution 
if and only if A is not invertible. Thus 0 is an eigenvalue of A if and only if A is not 
invertible. This fact will be added to the Invertible Matrix Theorem in Section 5.2. 

The following important theorem will be needed later. Its proof illustrates a typical 
calculation with eigenvectors. One way to prove the statement “If P then Q” is to show 
that P and the negation of Q leads to a contradiction. This strategy is used in the proof 
of the theorem. 


THEOREM 2 


If Vi,..., \ r are eigenvectors that correspond to distinct eigenvalues Ai,..., A r 
of an n x n matrix A, then the set {vi,..., v r } is linearly independent. 


PROOF Suppose {vi,..., v r } is linearly dependent. Since Vi is nonzero, Theorem 7 in 
Section 1.7 says that one of the vectors in the set is a linear combination of the preceding 
vectors. Let p be the least index such that v^+i is a linear combination of the preceding 
(linearly independent) vectors. Then there exist scalars C\,... ,c p such that 

c\\ 1 4-E c p \ p = v^+i (5) 

Multiplying both sides of (5) by A and using the fact that A\k = A kVk for each k, we 
obtain 


c\A\\ 4-E c p A\ p = A\ p + 1 

c\ A 1 V 1 4-E c p Xp\p = A^+iv^+i (6) 

Multiplying both sides of (5) by A p +\ and subtracting the result from (6), we have 

c\(X\ — A^+^Vi 4- • • • 4- c p (Xp — Xp+i)\ p = 0 (7) 

Since {vi,..., v^} is linearly independent, the weights in (7) are all zero. But none of 
the factors A, — A^+i are zero, because the eigenvalues are distinct. Hence c\ = 0 for 
i = 1 ,... ,p. But then (5) says that v^+i = 0, which is impossible. Hence {vi,..., v r } 
cannot be linearly dependent and therefore must be linearly independent. ■ 
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Eigenvectors and Difference Equations 

This section concludes by showing how to construct solutions of the first-order differ¬ 
ence equation discussed in the chapter introductory example: 

x^+i = Ax k (k = 0,1,2,...) (8) 

If A is an n x n matrix, then (8) is a recursive description of a sequence {x^} in R 7? . 
A solution of (8) is an explicit description of {x^} whose formula for each x^ does not 
depend directly on A or on the preceding terms in the sequence other than the initial 
term xo. 

The simplest way to build a solution of (8) is to take an eigenvector x 0 and its 
corresponding eigenvalue A and let 

x k = X k x 0 (k = 1,2,...) (9) 

This sequence is a solution because 

Ax k = ,4(A a x 0 ) = A a 'G4x 0 ) = X k (Ax 0 ) = A a+ 1 x 0 = x k+ i 

Linear combinations of solutions in the form of equation (9) are solutions, too! See 
Exercise 33. 


PRACTICE PROBLEMS 


1. Is 5 an eigenvalue of A = 




2. If x is an eigenvector of A corresponding to A, what is A 2 3 4 x? 

3. Suppose that bi and \)2 are eigenvectors corresponding to distinct eigenvalues A i and 
A 2 , respectively, and suppose that b 3 and b 4 are linearly independent eigenvectors 
corresponding to a third distinct eigenvalue A 3 . Does it necessarily follow that 
{bi,b2,b3,bz*} is a linearly independent set? [Hint: Consider the equation c\ bi + 

C2^2 + (C3b3 + C\ bzi) = 0.] 


4. If A is an n x n matrix and A is an eigenvalue of A, show that 2A is an eigenvalue 
of 2A. 


5.1 EXERCISES 


1. Is A = 2 an eigenvalue of 


2. Is A = —2 an eigenvalue of 


3 2 

3 8 

7 3 

3 -1 


? Why or why not? 


6 . Is 


1 


? Why or why not? 


-2 
1 

eigenvalue. 


an eigenvector of 


3 

3 

5 


3. Is 


1 
4 

value. 


4. Is 


— 1 + V2 
1 


eigenvalue. 


an eigenvector of 


2 1 
1 4 


corresponding eigenvector. 


? If so, find the 



4" 


3 

7 

9" 

5. Is 

-3 

1 

an eigenvector of 

-4 

2 

-5 

4 

1 

4 


? If so, find 


8 . Is A = 3 an eigenvalue of 
corresponding eigenvector. 


6 7 

3 7 

6 5 


? If so, find the 






3 

0 - 1 " 

an eigenvector of 

"-3 r 

-3 8 

? If so, find the eigen- 

7. Is A = 4 an eigenvalue of 

2 

-3 

3 1 

4 5 


? If so, find one 


1 2 2 
3 -2 1 

0 1 1 


? If so, find one 


the eigenvalue. 


In Exercises 9-16, find a basis for the eigenspace corresponding 
to each listed eigenvalue. 
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9. A = 

10 . A = 

11. A = 

12. A = 

13. A = 

14. A = 

15. A = 

16. A = 



Find the eigenvalues of the matrices in Exercises 17 and 18. 



i 

o 

0 

i 

o 


"xf 

1_ 

0 

1 

o 

17. 

0 

2 

5 

18. 

0 

0 

0 


i 

o 

0 

-1 


1 

0 

1_ 


19. For A = 



3 

3 

3 


, find one eigenvalue, with no cal¬ 


culation. Justify your answer. 


20. Without calculation, find one eigenvalue and two linearly 


independent eigenvectors of A = 


5 

5 

5 


5 

5 

5 


5 

5 

5 


. Justify 


your answer. 

In Exercises 21 and 22, A is an n x n matrix. Mark each statement 
True or False. Justify each answer. 


21. a. If Ax = Ax for some vector x, then A is an eigenvalue of 

A. 

b. A matrix A is not invertible if and only if 0 is an eigen¬ 
value of A . 

c. A number c is an eigenvalue of A if and only if the 
equation (A — cl)x = 0 has a nontrivial solution. 


d. Finding an eigenvector of A may be difficult, but check¬ 
ing whether a given vector is in fact an eigenvector is 
easy. 

e. To find the eigenvalues of A, reduce A to echelon form. 

22. a. If Ax = Ax for some scalar A, then x is an eigenvector 

of A. 

b. If Vi and v 2 are linearly independent eigenvectors, then 
they correspond to distinct eigenvalues. 

c. A steady-state vector for a stochastic matrix is actually an 
eigenvector. 

d. The eigenvalues of a matrix are on its main diagonal. 

e. An eigenspace of A is a null space of a certain matrix. 

23. Explain why a 2 x 2 matrix can have at most two distinct 
eigenvalues. Explain why an n x n matrix can have at most 
n distinct eigenvalues. 

24. Construct an example of a 2 x 2 matrix with only one distinct 
eigenvalue. 

25. Let A be an eigenvalue of an invertible matrix A. Show that 
A -1 is an eigenvalue of A -1 . [Hint: Suppose a nonzero x 
satisfies Ax = Ax.] 

26. Show that if A 2 is the zero matrix, then the only eigenvalue 
of A is 0. 

27. Show that A is an eigenvalue of A if and only if A is an 
eigenvalue of A T . [Hint: Find out how A — XI and A T — XI 
are related.] 

28. Use Exercise 27 to complete the proof of Theorem 1 for the 
case when A is lower triangular. 

29. Consider an n xn matrix A with the property that the row 
sums all equal the same number s . Show that s is an eigen¬ 
value of A. [Hint: Find an eigenvector.] 

30. Consider an n xn matrix A with the property that the col¬ 
umn sums all equal the same number s. Show that s is an 
eigenvalue of A. [Hint: Use Exercises 27 and 29.] 


In Exercises 31 and 32, let A be the matrix of the linear transfor¬ 
mation T . Without writing A, find an eigenvalue of A and describe 
the eigenspace. 


31. T is the transformation on M 2 that reflects points across some 
line through the origin. 

32. T is the transformation on IR 3 that rotates points about some 
line through the origin. 


33. Let u and v be eigenvectors of a matrix A, with corresponding 
eigenvalues A and /x, and let c\ and c 2 be scalars. Define 

x k = ciX k u + c 2 li k y (k = 0,1,2 ,...) 


a. What is x k + { , by definition? 

b. Compute Ax k from the formula for x k , and show that 
Ax k = Xfc+i. This calculation will prove that the se¬ 
quence {x k } defined above satisfies the difference equa¬ 
tion Xfc+i = Ax k (k = 0,1,2,...). 
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34. Describe how you might try to build a solution of a difference 
equation x^+i = Ax^ (k = 0, 1 , 2 ,...) if you were given the 
initial x 0 and this vector did not happen to be an eigenvector 
of A. [Hint: How might you relate x 0 to eigenvectors of Al] 

35. Let u and v be the vectors shown in the figure, and suppose u 
and v are eigenvectors of a 2 x 2 matrix A that correspond 
to eigenvalues 2 and 3, respectively. Let T : R 2 -> M 2 be 
the linear transformation given by T (x) = Ax for each x in 
M 2 , and let w = u + v. Make a copy of the figure, and on 
the same coordinate system, carefully plot the vectors L(u), 
T (v) , and T (w). 



36. Repeat Exercise 35, assuming u and v are eigenvectors of A 
that correspond to eigenvalues —1 and 3, respectively. 


[M] In Exercises 37-40, use a matrix program to find the eigen¬ 
values of the matrix. Then use the method of Example 4 with a 
row reduction routine to produce a basis for each eigenspace. 



8 

-10 

-5" 



37. 

2 

17 

2 




-9 

-18 

4 _ 




9 

-4 

-2 

-4“ 


38. 

-56 

32 

-28 

44 


-14 

-14 

6 

-14 



42 

-33 

21 

-45 _ 



4 

-9 

-7 

8 

2 


-7 

-9 

0 

7 

14 

39. 

5 

10 

5 

-5 - 

10 


-2 

3 

7 

0 

4 


_ —3 

-13 

-7 

10 

11 


"-4 

-4 

20 

-8 - 

-1 


14 

12 

46 

18 

2 

40. 

6 

4 - 

-18 

8 

1 


11 

7 - 

-37 

17 

2 


18 

12 - 

-60 

24 

5 


SOLUTIONS TO PRACTICE PROBLEMS 


1. The number 5 is an eigenvalue of A if and only if the equation (A — 5/)x = 0 has a 
nontrivial solution. Form 



^0 

1 _ 

-3 

_ 1 


in 

1 _ 

0 

1 

0 


1_ 

-3 

1 — 

A-51 = 

3 

0 

5 

—— 

0 

5 

0 

— 

3 

-5 

5 


<N 

_1 

2 

1 _ 


1 

0 

0 

1_ 


<N 

_1 

2 

— 1 


and row reduce the augmented matrix: 


1 

-3 

1 

0 


"1 

-3 

1 

0 


“1 

-3 

1 

0 

3 

-5 

5 

0 


0 

4 

2 

0 


0 

4 

2 

0 

<N 

_ 1 

2 

1 

0 


0 

00 

-1 

0 


0 

0 

-5 

0 


At this point, it is clear that the homogeneous system has no free variables. Thus 
A — 51 is an invertible matrix, which means that 5 is not an eigenvalue of A. 

2. If x is an eigenvector of A corresponding to A, then Ax = Ax and so 

A 2 x = A (Ax) = XAx = A 2 x 

Again, A 3 x = A(A 2 x) = A(A 2 x) = A 2 Ax = A 3 x. The general pattern, A k x = X k x , 
is proved by induction. 

3. Yes. Suppose C\b\ + 02^2 + (<^3 + c^b^) = 0. Since any linear combination of 
eigenvectors corresponding to the same eigenvalue is in the eigenspace for that 
eigenvalue, <^3 + is either 0 or an eigenvector for A 3 . If <^3 + 04 b 4 were 
an eigenvector for A 3 , then by Theorem 2, {bi, b 2 , <^3 + 04 b 4 } would be a linearly 
independent set, which would force c\ — C 2 — 0 andc 3 b 3 + 04 b 4 = 0 , contradicting 
that C 3 b 3 + 04 b 4 is an eigenvector. Thus <^3 + 04 b 4 must be 0, implying that 
c\b\ + C 2 b 2 = 0 also. By Theorem 2, {bi,b 2 } is a linearly independent set so 
C\ — C 2 — 0. Moreover, {b 3 ,b 4 } is a linearly independent set so C 3 = C 4 — 0. Since 
all of the coefficients c\ , C2, C3, and C 4 must be zero, it follows that {bi, b 2 , b 3 , bzt} 
is a linearly independent set. 
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4. Since A is an eigenvalue of A, there is a nonzero vector x in M 77 such that v4x = Ax. 
Multiplying both sides of this equation by 2 results in the equation 2(v4x) = 2(Ax). 
Thus (2v4)x = (2A)x and hence 2A is an eigenvalue of 2v4. 


5.2 THE CHARACTERISTIC EQUATION 


Useful information about the eigenvalues of a square matrix A is encoded in a special 
scalar equation called the characteristic equation of A . A simple example will lead to 
the general case. 


EXAMPLE 1 Find the eigenvalues of A = . 

SOLUTION We must find all scalars A such that the matrix equation 

(A — A/)x = 0 


has a nontrivial solution. By the Invertible Matrix Theorem in Section 2.3, this problem 
is equivalent to finding all A such that the matrix A — XI is not invertible, where 


A-XI = 



By Theorem 4 in Section 2.2, this matrix fails to be invertible precisely when its 
determinant is zero. So the eigenvalues of A are the solutions of the equation 


Recall that 



det(v4 — XI) 






= ad — be 


det(v4 — XI) = 



(2-A)(-6-A)-(3)(3) 

— 12 + 6A — 2A + A 2 — 9 

A 2 + 4A - 21 
(A — 3) (A + 7) 


If det(v4 — XI) = 0, then A = 3 or A = —7. So the eigenvalues of A are 3 and —7. ■ 

The determinant in Example 1 transformed the matrix equation (A — A I)x = 0, 
which involves two unknowns (A and x), into the scalar equation A 2 + 4A — 21=0, 
which involves only one unknown. The same idea works for n x n matrices. However, 
before turning to larger matrices, we summarize the properties of determinants needed 
to study eigenvalues. 


Determinants 

Let A be an n x n matrix, let U be any echelon form obtained from A by row 
replacements and row interchanges (without scaling), and let r be the number of such 
row interchanges. Then the determinant of A, written as detv4, is (— l ) 7 times the 
product of the diagonal entries Un,..., u nn in U. If A is invertible, then u\\,..., u nn 
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are all pivots (because A ~ I n and the ua have not been scaled to l’s). Otherwise, at 
least u nn is zero, and the product U\\ • • • u nn is zero. Thus 1 


del A = 




( product of 
pivots in U 


when A is invertible 
when A is not invertible 



EXAMPLE 2 Compute detv4 for A = 



SOLUTION The following row reduction uses one row interchange: 



"1 

5 

0 


"1 

5 

0 


"1 

5 

0 


A ~ 

0 

-6 

-1 


0 

-2 

0 


0 

-2 

0 

= U\ 


0 

-2 

0 


0 

-6 

-1 


0 

0 

-1 



So detv4 equals (—1) 1 (1)(—2)(—1) = —2. The following alternative row reduction 
avoids the row interchange and produces a different echelon form. The last step adds 
— 1/3 times row 2 to row 3: 



"1 

5 

0 " 


"1 

5 

0 


A ~ 

0 

-6 

-1 


0 

-6 

-1 

= U 2 


0 

-2 

0 


0 

0 

!/3_ 



This time detv4 is (—1)°(1)(—6)(l/3) = —2, the same as before. ■ 

Formula (1) for the determinant shows that A is invertible if and only if detv4 is 
nonzero. This fact, and the characterization of invertibility found in Section 5.1, can be 
added to the Invertible Matrix Theorem. 


THEOREM 



FIGURE 1 


The Invertible Matrix Theorem (continued) 

Let A be an n x/i matrix. Then A is invertible if and only if: 

s. The number 0 is not an eigenvalue of A. 

t. The determinant of A is not zero. 


When A is a 3 x 3 matrix, | detv4| turns out to be the volume of the parallelepiped 
determined by the columns ai, a 2 , and ^ of A, as in Figure 1. (See Section 3.3 for 
details.) This volume is nonzero if and only if the vectors ai, a 2 , and a 3 are linearly 
independent, in which case the matrix A is invertible. (If the vectors are nonzero and 
linearly dependent, they lie in a plane or along a line.) 

The next theorem lists facts needed from Sections 3.1 and 3.2. Part (a) is included 
here for convenient reference. 


1 Formula (1) was derived in Section 3.2. Readers who have not studied Chapter 3 may use this formula as 
the definition of det A . It is a remarkable and nontrivial fact that any echelon form U obtained from A 
without scaling gives the same value for det A . 
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THEOREM 3 


Properties of Determinants 

Let A and B be n x n matrices. 

a. A is invertible if and only if det A ^ 0. 

b. detAB = (det A) (det B). 

c. detv4 r = det A. 

d. If A is triangular, then det A is the product of the entries on the main diagonal 
of A. 

e. A row replacement operation on A does not change the determinant. A row 
interchange changes the sign of the determinant. A row scaling also scales the 
determinant by the same scalar factor. 


The Characteristic Equation 

Theorem 3(a) shows how to determine when a matrix of the form A — XI is not 
invertible. The scalar equation det(A — XI) = 0 is called the characteristic equation 
of A, and the argument in Example 1 justifies the following fact. 


A scalar A is an eigenvalue of an n x n matrix A if and only if A satisfies the 
characteristic equation 

det(A -XI) =0 


EXAM PLE 3 Find the characteristic equation of 


A 


5 -2 
0 3 
0 0 
0 0 


6 

8 


1 


0 

5 4 

0 1 


SOLUTION Form A — A/, and use Theorem 3(d): 


det(A — XI) = det 


5-A 
0 
0 
0 


-2 
3 — A 
0 
0 


6 

-8 
5-A 
0 


(5 - A)(3 - A)(5 - A)(l - A) 


The characteristic equation is 


(5 - A) 2 (3 - A)(l - A) = 0 


or 


(A - 5) 2 (A - 3)(A - 1) = 0 


Expanding the product, we can also write 


A 4 - 14A 3 + 68A 2 - BOA + 75 = 0 


-1 

0 

4 

1 — A 


In Examples 1 and 3, det (A — XI) is a polynomial in A. It can be shown that if A is 
an n x n matrix, then det (A — XI) is a polynomial of degree n called the characteristic 
polynomial of A . 

The eigenvalue 5 in Example 3 is said to have multiplicity 2 because (A — 5) 
occurs two times as a factor of the characteristic polynomial. In general, the (algebraic) 
multiplicity of an eigenvalue A is its multiplicity as a root of the characteristic equation. 
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Factoring a 
Polynomial 5-8 


EXAMPLE 4 The characteristic polynomial of a 6 x 6 matrix is A 6 — 4A 5 — 12 A 4 . 
Find the eigenvalues and their multiplicities. 

SOLUTION Factor the polynomial 

A 6 - 4A 5 - 12 A 4 = A 4 (A 2 - 4A - 12 ) = A 4 (A - 6 )(A + 2 ) 

The eigenvalues are 0 (multiplicity 4), 6 (multiplicity 1), and —2 (multiplicity 1). ■ 

We could also list the eigenvalues in Example 4as0,0,0,0,6, and —2, so that the 
eigenvalues are repeated according to their multiplicities. 

Because the characteristic equation for an n x n matrix involves an nth-degree 
polynomial, the equation has exactly n roots, counting multiplicities, provided complex 
roots are allowed. Such complex roots, called complex eigenvalues , will be discussed 
in Section 5.5. Until then, we consider only real eigenvalues, and scalars will continue 
to be real numbers. 

The characteristic equation is important for theoretical purposes. In practical work, 
however, eigenvalues of any matrix larger than 2 x 2 should be found by a computer, 
unless the matrix is triangular or has other special properties. Although a 3 x 3 charac¬ 
teristic polynomial is easy to compute by hand, factoring it can be difficult (unless the 
matrix is carefully chosen). See the Numerical Notes at the end of this section. 


Similarity 

The next theorem illustrates one use of the characteristic polynomial, and it provides 
the foundation for several iterative methods that approximate eigenvalues. If A and 
B are nxn matrices, then A is similar to B if there is an invertible matrix P 
such that P~ l AP = B, or, equivalently, A = PBP~ l . Writing Q for P ~ l , we have 
Q~ l BQ = A. So B is also similar to A , and we say simply that A and B are similar. 
Changing A into P~ l AP is called a similarity transformation. 

THEOREM 4 If n x n matrices A and B are similar, then they have the same characteristic 

polynomial and hence the same eigenvalues (with the same multiplicities). 


PROOF If B = P~ l AP, then 

B -XI = P- 1 AP-XP~ 1 P = P~ l (AP-XP) = P~\A-XI)P 

Using the multiplicative property (b) in Theorem 3, we compute 

det(5 - XI) = det[P _1 C4 - XI)P] 

= det(P _1 ) • det(A — XI) ■ det(P) (2) 

Since det(/ >_1 ) • det(/ > ) = det (P~ l P) = det/ = 1, we see from equation (2) that 
det(5 — XI) = det(A — XI). ■ 


WARNINGS: 


1. The matrices 



are not similar even though they have the same eigenvalues. 

2. Similarity is not the same as row equivalence. (If A is row equivalent to B , then 
B = EA for some invertible matrix E .) Row operations on a matrix usually 
change its eigenvalues. 
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Application to Dynamical Systems 

Eigenvalues and eigenvectors hold the key to the discrete evolution of a dynamical 
system, as mentioned in the chapter introduction. 


EXAMPLES Let v4 


.95 .03 

.05 .97 


. Analyze the long-term behavior of the dynam¬ 


ical system defined by x^+i = Ax k (k = 0, 1,2,...), with xq 


.6 

.4 


SOLUTION The first step is to find the eigenvalues of A and a basis for each eigenspace 
The characteristic equation for A is 


0 = det 


95 - A .03 
.05 .97 - A 


(.95 — A)(.97 — A) — (.03) (.05) 


A 2 - 1.92A + .92 


By the quadratic formula 


A 


1.92 =b y(1.92) 2 -4(.92) 1.92 zb V.0064 


2 


2 


1.92 zb .08 
2 


1 or .92 


It is readily checked that eigenvectors corresponding to A = 1 and A 
of 


. 92 are multiples 


Vl 


3 

5 


and V 2 


1 

1 


respectively. 

The next step is to write the given x 0 in terms of Vi and V 2 . This can be done because 
{vi,y 2 } is obviously a basis for M 2 . (Why?) So there exist weights C\ and such that 


x 0 = C\Y\ + = [Vl v 2 ] 


Cl 

C2 


(3) 


In fact, 


Cl 

c 2 


[vi v 2 ] ! x 0 


"3 

r 

-i 

".60" 

5 

-i 


.40 


1 

1 _ 

-1" 


i 

o 

so 

• 

1 _ 


".125" 

oo 

-5 

3 


1 

o 

• 

_1 


.225 


(4) 


Because Vi and V 2 in (3) are eigenvectors of A, with A\\ = \\ and AV 2 
easily compute each x^: 


92v 2, we 


xi = v4x 0 = ciAvi -b c 2 A\ 2 

= Ci\i + c 2 (. 92)v 2 
x 2 = Axi = ciAvi -b c 2 (.92)v4v 2 


Using linearity of x i-> Ax 
Vi and v 2 are eigenvectors. 


Civi + c 2 (.92) 2 v 2 


and so on. In general, 


X* = Civi + c 2 (. 92) k \ 2 (k = 0,1,2 ,...) 


Using ci and C 2 from (4), 

x k = .125 


3 

5 


+ .225(.92) 


k 


l 

l 


(k = 0 , 1 , 2 ,...) 


(5) 
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This explicit formula for gives the solution of the difference equation x&+i 

375 

oo, (.92 ) a tends to zero and x^ tends to 


Ax k . 


As k 


.625 


.125vi. 


The calculations in Example 5 have an interesting application to a Markov chain 
discussed in Section 4.9. Those who read that section may recognize that matrix A 
in Example 5 above is the same as the migration matrix M in Section 4.9, xo is the 
initial population distribution between city and suburbs, and x^ represents the population 
distribution after k years. 

Theorem 18 in Section 4.9 stated that for a matrix such as A, the sequence x& tends 
to a steady-state vector. Now we know why the x^ behave this way, at least for the 
migration matrix. The steady-state vector is .125vi, a multiple of the eigenvector v l9 
and formula (5) for x^ shows precisely why x^ -> .125vi. 


i— NUMERICAL NOTES - 

1. Computer software such as Mathematica and Maple can use symbolic calcu¬ 
lations to find the characteristic polynomial of a moderate-sized matrix. But 
there is no formula or finite algorithm to solve the characteristic equation of a 
general n x n matrix for n > 5. 

2. The best numerical methods for finding eigenvalues avoid the characteristic 
polynomial entirely. In fact, MATLAB finds the characteristic polynomial 
of a matrix A by first computing the eigenvalues Ai,..., X n of A and then 
expanding the product (A — Ai)(A — A 2 ) • • • (A — X n ). 

3. Several common algorithms for estimating the eigenvalues of a matrix A 
are based on Theorem 4. The powerful QR algorithm is discussed in the 
exercises. Another technique, called Jacobi’s method , works when A = A T 
and computes a sequence of matrices of the form 

Ai = A and A k+] = A k P k (k = 1,2,...) 

Each matrix in the sequence is similar to A and so has the same eigenvalues 
as A. The nondiagonal entries of A^+x tend to zero as k increases, and the 
diagonal entries tend to approach the eigenvalues of A. 

4. Other methods of estimating eigenvalues are discussed in Section 5.8. 


PRACTICE PROBLEM 


Find the characteristic equation and eigenvalues of A 


1 

4 



5.2 EXERCISES 


Find the characteristic polynomial and the eigenvalues of the 
matrices in Exercises 1-8. 




Exercises 9-14 require techniques from Section 3.1. Find the 
characteristic polynomial of each matrix, using either a cofactor 
expansion or the special formula for 3x3 determinants described 
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prior to Exercises 15-18 in Section 3.1. [Note: Finding the char¬ 
acteristic polynomial of a 3 x 3 matrix is not easy to do with just 
row operations, because the variable A is involved.] 



11 . 

13. 






For the matrices in Exercises 15-17, list the eigenvalues, repeated 
according to their multiplicities. 




4 

-7 

0 

to 

_ 1 


1 _ 

0 

0 

1 

0 

0 

3 

-4 

6 

16. 

OO 

-4 

0 

0 

0 

0 

3 

OO 

0 

7 

1 

0 

0 

0 

0 

1_ 


_ 1 

-5 

2 

1 


3 0 0 0 0 
-51000 
3 8 0 0 0 
0-7210 
- 419-23 


18. It can be shown that the algebraic multiplicity of an eigen¬ 
value A is always greater than or equal to the dimension of the 
eigenspace corresponding to A. Find h in the matrix A below 
such that the eigenspace for A = 5 is two-dimensional: 




19. Fet A be an n x n matrix, and suppose A has n real eigenval¬ 
ues, Ai,..., A n , repeated according to multiplicities, so that 


det (A — A/) — (Ai — A)(A 2 — A) • • • ( X n — A) 


Explain why det A is the product of the n eigenvalues of 
A. (This result is true for any square matrix when complex 
eigenvalues are considered.) 

20. Use a property of determinants to show that A and A T have 
the same characteristic polynomial. 


In Exercises 21 and 22, A and B are n x n matrices. Mark each 
statement True or False. Justify each answer. 


21. a. The determinant of A is the product of the diagonal entries 

in A. 

b. An elementary row operation on A does not change the 
determinant. 

c. (det A) (det 2?) = det AB 

d. If A + 5 is a factor of the characteristic polynomial of A, 
then 5 is an eigenvalue of A. 


22. a. If A is 3x3, with columns ai, a 2 , and < 13 , then det A 

equals the volume of the parallelepiped determined by ai, 
a 2 and a 3 . 

b. detA r = (— 1)detA. 

c. The multiplicity of a root r of the characteristic equation 
of A is called the algebraic multiplicity of r as an eigen¬ 
value of A. 

d. A row replacement operation on A does not change the 
eigenvalues. 


A widely used method for estimating eigenvalues of a general 
matrix A is the QR algorithm. Under suitable conditions, this al¬ 
gorithm produces a sequence of matrices, all similar to A, that be¬ 
come almost upper triangular, with diagonal entries that approach 
the eigenvalues of A. The main idea is to factor A (or another 
matrix similar to A) in the form A = Q\R\, where Qf = Q]~ l 
and R\ is upper triangular. The factors are interchanged to form 
A 1 = R\Qu which is again factored as A 1 = Q 2 R 2 ', then to form 
A 2 = R 2 Q 2 , and so on. The similarity of A, A \,... follows from 
the more general result in Exercise 23. 


23. Show that if A = QR with Q invertible, then A is similar to 

A { = RQ. 

24. Show that if A and B are similar, then det A = det B. 


25. Fet A ~ 


.6 

.4 


.3 

.7 


,vi = 


3/7 

4/7 


,x 0 = 


.5 

.5 


. [Note: A is 


the stochastic matrix studied in Example 5 of Section 4.9.] 


a. Find a basis for M 2 consisting of Vi and another eigenvec¬ 
tor v 2 of A. 

b. Verify that x 0 may be written in the form x 0 = Vi + cv 2 . 

c. For k = 1,2,..., define x k = A k x 0 . Compute Xj and x 2 , 
and write a formula for x k . Then show that x k -> Vi as k 
increases. 


26. Fet A ~ 


. Use formula (1) for a determinant 


a b 
c d 

(given before Example 2) to show that det A = ad — be. 
Consider two cases: a / 0 and a — 0. 



".5 

.2 .3" 


.3 


1" 

27. Fet A = 

.3 

.2 

.8 .3 

0 .4 

, Vi = 

.6 

_.1_ 

, v 2 = 

-3 

2 



~-l" 


" 1 " 

v 3 = 

0 

1 

, and w = 

1 

1 


a. Show that Vi, v 2 , and v 3 are eigenvectors of A. [Note: A is 
the stochastic matrix studied in Example 3 of Section 4.9.] 

b. Fet x 0 be any vector in E 3 with nonnegative entries whose 
sum is 1. (In Section 4.9, x 0 was called a probability 
vector.) Explain why there are constants C \, c 2 , and c 3 
such that x 0 = c\\\ + c 2 v 2 + c 3 v 3 . Compute w r x 0 , and 
deduce that Ci = 1. 

c. For k = 1,2,..., define x k = A A x 0 , with x 0 as in part 
(b). Show that x k -> Vi as k increases. 
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28. [M] Construct a random integer-valued 4x4 matrix A, and 
verify that A and A T have the same characteristic polynomial 
(the same eigenvalues with the same multiplicities). Do A 
and A T have the same eigenvectors? Make the same analysis 
of a 5 x 5 matrix. Report the matrices and your conclusions. 

29. [M] Construct a random integer-valued 4x4 matrix A. 

a. Reduce A to echelon form U with no row scaling, and use 
U in formula (1) (before Example 2) to compute det A . (If 
A happens to be singular, start over with a new random 
matrix.) 

b. Compute the eigenvalues of A and the product of these 
eigenvalues (as accurately as possible). 


c. List the matrix A, and, to four decimal places, list the 
pivots in U and the eigenvalues of A . Compute det A with 
your matrix program, and compare it with the products 
you found in (a) and (b). 


30. [M] Let A = 


-6 

28 

21 

4 

-15 

-12 

-8 

a 

25 


. Lor each value of a in 


the set {32, 31.9, 31.8, 32.1, 32.2}, compute the characteris¬ 
tic polynomial of A and the eigenvalues. In each case, create 
a graph of the characteristic polynomial p(t) = det (A — 11) 
for 0 < t < 3. If possible, construct all graphs on one coordi¬ 
nate system. Describe how the graphs reveal the changes in 
the eigenvalues as a changes. 


SOLUTION TO PRACTICE PROBLEM 


The characteristic equation is 




det(v4 — XI) = det 


1 -A 
4 


(1 — A) (2 — A) — (—4) (4) 


-4 

2-A 

A 2 - 3A + 18 


From the quadratic formula, 

_ 3± V(—3) 2 -4(18) _ 3 ± 7^63 

2 2 

It is clear that the characteristic equation has no real solutions, so A has no real 
eigenvalues. The matrix A is acting on the real vector space Mr , and there is no nonzero 
vector v in M 2 such that A\ = Av for some scalar A. 


5.3 DIAGONALIZATION 


In many cases, the eigenvalue-eigenvector information contained within a matrix A can 
be displayed in a useful factorization of the form A = PDP~ l where D is a diagonal ma¬ 
trix. In this section, the factorization enables us to compute A k quickly for large values 
of A, a fundamental idea in several applications of linear algebra. Later, in Sections 5.6 
and 5.7, the factorization will be used to analyze (and decouple) dynamical systems. 

The following example illustrates that powers of a diagonal matrix are easy to 
compute. 


EXAMPLE 1 If D 

and 

D 3 = 


In general, 


DD 


5 

0" 

0 

3 ‘ 


"5 


1 

o 

II 

"5 

= _ ( 


, then D 2 


0 

3 


5 2 

0 


k 


o 

3 k 



for k > 1 




If A = PDP 1 for some invertible P and diagonal D , then A k is also easy to 
compute, as the next example shows. 






























284 CHAPTER 5 Eigenvalues and Eigenvectors 


THEOREM 5 


EXAMPLE 2 

where 


Let A = 





. Find a formula for A k , given that A 


= PDP ~ l , 






SOLUTION The standard formula for the inverse of a 2 x 2 matrix yields 



Then, by associativity of matrix multiplication, 

A 2 = (PDP~' )(PDP~ ] ) = PD(P~' P) DP~' = PDDP * 1 

/ 


= PD 2 P~ l 



Again, 


A 3 = (j PDP~ l )A 2 = (PDP~')PD 2 P~' = PDD 2 P ~ 1 = PD 3 P ~ 1 

/ 


In general, for k > 1, 




PD k P 




A square matrix A is said to be diagonalizable if A is similar to a diagonal matrix, 
that is, if A = PDP~ l for some invertible matrix P and some diagonal matrix D . 
The next theorem gives a characterization of diagonalizable matrices and tells how to 
construct a suitable factorization. 


The Diagonalization Theorem 

An n x n matrix A is diagonalizable if and only if A has n linearly independent 
eigenvectors. 

In fact, A = PDP ~ l , with D a diagonal matrix, if and only if the columns of 
P are n linearly independent eigenvectors of A. In this case, the diagonal entries 
of D are eigenvalues of A that correspond, respectively, to the eigenvectors in P. 


In other words, A is diagonalizable if and only if there are enough eigenvectors to 
form a basis of W 1 . We call such a basis an eigenvector basis of W 1 . 


PROOI First, observe that if P is any n x n matrix with columns ,\ n , and if D 

is any diagonal matrix with diagonal entries X\,... ,X n , then 


while 


AP = A[\\ \2 





y n ] = [ Ay i A\ 2 ■ ■ ■ Ay n ] 


0 

0 






( 1 ) 

( 2 ) 
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Now suppose A is diagonalizable and A = PDP 1 . Then right-multiplying this relation 
by P, we have AP = PD. In this case, equations (1) and (2) imply that 

[Ay i Ay 2 ••• 4v„] = [AiVi A 2 v 2 ••• A„v„ ] (3) 

Equating columns, we find that 

Av x = A 1 V 1 , v4v 2 = A 2 v 2 , ..., A\ n = X n \ n (4) 

Since P is invertible, its columns Vi,..., \ n must be linearly independent. Also, since 
these columns are nonzero, the equations in (4) show that X\,..., X n are eigenvalues 
and Vi,..., \ n are corresponding eigenvectors. This argument proves the “only if’ parts 
of the first and second statements, along with the third statement, of the theorem. 

Finally, given any n eigenvectors \\,... ,\ n , use them to construct the columns 
of P and use corresponding eigenvalues X\,... ,X n to construct D. By equations (1)- 
(3), AP = PD. This is true without any condition on the eigenvectors. If, in fact, the 
eigenvectors are linearly independent, then P is invertible (by the Invertible Matrix 
Theorem), and AP = PD implies that A = PDP 1 . ■ 


Diagonalizing Matrices 

EXAMPLE 3 Diagonalize the following matrix, if possible. 





That is, find an invertible matrix P and a diagonal matrix D such that A = PDP 1 . 

SOLUTION There are four steps to implement the description in Theorem 5. 

Stepl. Find the eigenvalues of A. As mentioned in Section 5.2, the mechanics of 
this step are appropriate for a computer when the matrix is larger than 2 x 2. To avoid 
unnecessary distractions, the text will usually supply information needed for this step. 
In the present case, the characteristic equation turns out to involve a cubic polynomial 
that can be factored: 


0 = det (A - XI) = -A 3 - 3A 2 + 4 

= —(A — 1 ) (A + 2 )~ 


The eigenvalues are A = 1 and A = — 2 . 


Step 2. Find three linearly independent eigenvectors of A. Three vectors are needed 
because A is a 3 x 3 matrix. This is the critical step. If it fails, then Theorem 5 says 
that A cannot be diagonalized. The method in Section 5.1 produces a basis for each 
eigenspace: 


Basis for A = 1: Vi = 





"-I" 


"-I" 

Basis for A = 

- 2 : v 2 = 

1 

0 

and V 3 = 

0 

1 


You can check that {vi, v 2 , V 3 } is a linearly independent set. 
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Step 3. Construct P from the vectors in step 2. The order of the vectors is unimportant. 
Using the order chosen in step 2, form 


P = [ Vl 


v 2 


V 3 ] 


1 -1 -1 


1 1 


0 


1 0 1 


Step 4. ConstructD from the corresponding eigenvalues . In this step, it is essential that 
the order of the eigenvalues matches the order chosen for the columns of P. Use the 
eigenvalue A = —2 twice, once for each of the eigenvectors corresponding to A = —2: 





It is a good idea to check that P and D really work. To avoid computing P~ l , 
simply verify that AP = PD. This is equivalent to A = PDP~ l when P is invertible. 
(However, be sure that P is invertible!) Compute 



1 

3 

3 


1 

-1 

-1 



1 

2 

2 


AP = 

-3 

-5 

-3 


-1 

1 

0 

— 

-1 

-2 

0 


3 

3 

1 


1 

0 

1 



1 

0 

-2 



1 

-1 

-f 


"1 

0 

0 " 



1 

2 

2 " 


PD = 

-1 

1 

0 


0 - 

-2 

0 

— 


-1 - 

-2 

0 



1 

0 

1 


0 

0 - 

-2 



1 

0 - 

-2 



EXAMPLE 4 Diagonalize the following matrix, if possible. 





SOLUTION The characteristic equation of A turns out to be exactly the same as that in 
Example 3: 

0 = det (A - XI) = -A 3 - 3A 2 + 4 = -(A - 1)(A + 2) 2 


The eigenvalues are A = 1 and A = —2. However, it is easy to verify that each 
eigenspace is only one-dimensional: 


Basis for A = 1: 


Basis for A = —2: 






There are no other eigenvalues, and every eigenvector of A is a multiple of either Vi 
or v 2 . Hence it is impossible to construct a basis of M 3 using eigenvectors of A. By 
Theorem 5, A is not diagonalizable. ■ 


The following theorem provides a sufficient condition for a matrix to be 
diagonalizable. 


THEOREM 6 


An n x n matrix with n distinct eigenvalues is diagonalizable. 
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THEOREM 7 


PROOF Let ,\ n be eigenvectors corresponding to the n distinct eigenvalues of 

a matrix A. Then {vi,..., v„} is linearly independent, by Theorem 2 in Section 5.1. 
Hence A is diagonalizable, by Theorem 5. ■ 

It is not necessary for an n x n matrix to have n distinct eigenvalues in order to be 
diagonalizable. The 3x3 matrix in Example 3 is diagonalizable even though it has only 
two distinct eigenvalues. 


EXAMPLE 5 Determine if the following matrix is diagonalizable. 




SOLUTION This is easy! Since the matrix is triangular, its eigenvalues are obviously 5, 
0, and —2. Since A is a 3 x 3 matrix with three distinct eigenvalues, A is diagonalizable. 


Matrices Whose Eigenvalues Are Not Distinct 

If an n x n matrix A has n distinct eigenvalues, with corresponding eigenvectors Vi,..., 
\ n , and if P = [ Vi • • • v n ], then P is automatically invertible because its columns 
are linearly independent, by Theorem 2. When A is diagonalizable but has fewer than n 
distinct eigenvalues, it is still possible to build P in a way that makes P automatically 
invertible, as the next theorem shows. 1 


Let A be an n x n matrix whose distinct eigenvalues are X \,..., X p . 

a. For 1 < k < p, the dimension of the eigenspace for A^ is less than or equal to 
the multiplicity of the eigenvalue A^. 

b. The matrix A is diagonalizable if and only if the sum of the dimensions of 
the eigenspaces equals n , and this happens if and only if (i) the characteristic 
polynomial factors completely into linear factors and (ii) the dimension of the 
eigenspace for each A^ equals the multiplicity of A^. 

c. If A is diagonalizable and Bk is a basis for the eigenspace corresponding to A^ 
for each A, then the total collection of vectors in the sets B \,..., B p forms an 
eigenvector basis for W l . 


EXAMPLE 6 Diagonalize the following matrix, if possible. 

” 5000 " 

0 5 0 0 

A ~ 14-30 

-1 -2 0 -3 


1 The proof of Theorem 7 is somewhat lengthy but not difficult. For instance, see S. Friedberg, A. Insel, and 
L. Spence, Linear Algebra, 4th ed. (Englewood Cliffs, NJ: Prentice-Hall, 2002), Section 5.2. 
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SOLUTION Since A is a triangular matrix, the eigenvalues are 5 and —3, each with 
multiplicity 2. Using the method in Section 5.1, we find a basis for each eigenspace. 


Basis for A = 5: V[ = 

-8 

4 

1 

and V 2 = 

-16 

4 

0 


0 


1 


Basis for A = 

-3: v 3 = 

0 

0 

1 

and V 4 = 

0 

0 

0 



0 


1 


The 

[vi 


set {vi,..., V 4 } is linearly independent, by Theorem 7. So the matrix P 
• • • V 4 ] is invertible, and A = PDP ~ l , where 


P 


"-8 

-16 

0 

0 


5 

0 

0 

0 

4 

4 

0 

0 

and D = 

0 

5 

0 

0 

1 

0 

1 

0 

0 

0 

-3 

0 

0 

1 

0 

1 


0 

0 

0 

-3 


PRACTICE PROBLEMS 


WEB 


1. 

2 . 


Compute v4 8 , where A = 




Let A 


3 

2 


12 

7 


,vi 


3 

1 


, and \2 


2 

1 


. Suppose you are told that Vi and 


V 2 are eigenvectors of A. Use this information to diagonalize A. 


3. Let A be a 4 x 4 matrix with eigenvalues 5,3, and —2, and suppose you know that 
the eigenspace for A = 3 is two-dimensional. Do you have enough information to 
determine if A is diagonalizable? 


5.3 EXERCISES 


In Exercises 1 and 2, let A = PDP 1 and compute A 


5. 


1. P = 


5 

2 


7 

3 


,D = 


2 

1 


2 

3 


1 

1 


2. P = 


2 

3 


3 

5 



"2 

0" 


_ 1 

2 

2 








_0 

1 
















" 1 

1 

2" 


"5 

0 

0 


'1/4 

1/2 

1/4“ 



"1 

_0 

o' 

1/2 


1 

0 

-1 


0 

1 

0 


1/4 

1/2 

-3/4 

,D 

— 


1 

-1 

0 


0 

0 

1 


1/4 

-1/2 

!/4 _ 


In Exercises 3 and 4, use the factorization A = PDP 1 to com¬ 
pute A k , where k represents an arbitrary positive integer. 



a 

3 (a — b ) 





In Exercises 5 and 6, the matrix A is factored in the form PDP ~ l . 
Use the Diagonalization Theorem to find the eigenvalues of A and 
a basis for each eigenspace. 



Diagonalize the matrices in Exercises 7-20, if possible. The 
eigenvalues for Exercises 11-16 are as follows: (11) A = 1,2,3; 
(12) A = 2, 8; (13) A = 5,1; (14) A = 5,4; (15) A = 3,1; (16) 
A = 2,1. For Exercise 18, one eigenvalue is A = 5 and one 
eigenvector is (—2,1,2). 
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7 . 

9 . 

11 . 








4 
1 
0 

5 
0 
0 
0 








4 

o 

i 

-2 


2 

5 

4 


0 

0 

Ul 

1_ 


0 

-4 

-6 


-1 

0 

-3 


1 

2 

5 


-7 

-16 


4' 

6 

13 

— 

2 

12 

16 


1 



In Exercises 21 and 22, A, B , P, and D ar q nxn matrices. 
Mark each statement True or False. Justify each answer. (Study 
Theorems 5 and 6 and the examples in this section carefully before 
you try these exercises.) 


24 . A is a 3 x 3 matrix with two eigenvalues. Each eigenspace is 
one-dimensional. Is A diagonalizable? Why? 

25 . A is a 4 x 4 matrix with three eigenvalues. One eigenspace 
is one-dimensional, and one of the other eigenspaces is two- 
dimensional. Is it possible that A is not diagonalizable? 
Justify your answer. 

26 . A is a 7 x 7 matrix with three eigenvalues. One eigenspace is 
two-dimensional, and one of the other eigenspaces is three- 
dimensional. Is it possible that A is not diagonalizable? 
Justify your answer. 

27 . Show that if A is both diagonalizable and invertible, then so 
is A~\ 


28 . Show that if A has n linearly independent eigenvectors, then 
so does A T . [Hint: Use the Diagonalization Theorem.] 

29 . A factorization A = PDP~ l is not unique. Demonstrate this 

3 0 


for the matrix A in Example 2. With D\ = 


0 5 


, use 


the information in Example 2 to find a matrix P i such that 
A = P { D { P~ l . 


30 . With A and D as in Example 2, find an invertible P 2 unequal 
to the P in Example 2, such that A = P 2 DP 2 -1 . 

31 . Construct a nonzero 2x2 matrix that is invertible but not 
diagonalizable. 


32. Construct a nondiagonal 2x2 matrix that is diagonalizable 
but not invertible. 


21 . a. xl is diagonalizable if A = PDP 1 for some matrix D 

and some invertible matrix P . 

b. If R" has a basis of eigenvectors of A, then A is diago¬ 
nalizable. 

c. A is diagonalizable if and only if A has n eigenvalues, 
counting multiplicities. 

d. If xl is diagonalizable, then A is invertible. 

22 . a. xl is diagonalizable if A has n eigenvectors. 

b. If xl is diagonalizable, then A has n distinct eigenvalues. 

c. If AP = PD , with D diagonal, then the nonzero columns 
of P must be eigenvectors of A . 

d. If xl is invertible, then A is diagonalizable. 

23 . A is a 5 x 5 matrix with two eigenvalues. One eigenspace 
is three-dimensional, and the other eigenspace is two- 
dimensional. Is A diagonalizable? Why? 


[M] Diagonalize the matrices in Exercises 33-36. Use your ma¬ 
trix program’s eigenvalue command to find the eigenvalues, and 
then compute bases for the eigenspaces as in Section 5.1. 





-6 

4 

0 

_i 


i 

o 

13 

00 

1 

-3 

0 

1 

6 

34 . 

4 

9 

8 

4 

-1 

-2 

1 

0 

8 

6 

12 

8 

-4 

4 

0 

7 


0 

5 

0 

-4 


11 

-6 

4 

-10 

-4 

-3 

5 

-2 

4 

1 

-8 

12 

-3 

12 

4 

1 

6 

-2 

3 

-1 

8 

-18 

8 

-14 

- 1 _ 

4 

4 

2 

3 

-2" 

0 

1 

-2 

-2 

2 

6 

12 

11 

2 

-4 

9 

20 

10 

10 

-6 

15 

28 

14 

5 

-3 
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and Eigenspace 5-14 


SOLUTIONS TO PRACTICE PROBLEMS 


1. det (A — XI) = A 2 — 3A + 2 = (A — 2)(A — 1). The eigenvalues are 2 and 1, and 


the corresponding eigenvectors are Vi 


3 

2 


and v 2 


1 

1 


. Next, form 


P 


3 

2 


1 

1 


D 


2 

0 


0 

1 


and P 


-l 


1 

2 


1 

3 


Since A = PDP~ ] , 


A 


8 


8 n-1 


PD*P 


3 

2 

3 

2 


1 

1 

1 

1 


2 


8 


0 

256 

0 


766 -765 
510 -509 


0 

l 8 

0 

1 



—1 

- 1 " 


-2 

1 _ 


- 1 

- 1 ' 

- 1 

<N 

_ 1 

3 


2. Compute A\\ 


"-3 12" 


"3" 


"3" 

1 

r- 

<N 

_l 


1 

_ 1 


- 1 

_ 1 


1 •vi, and 


A\’ 


3 12 
2 7 



" 2 " 


1 

1 _ 

1 

1 

_ 1 


3 


3-v 2 


So, Vi and v 2 are eigenvectors for the eigenvalues 1 and 3, respectively. Thus 


A = PDP 1 , where P = 

3 . Yes, A is diagonalizable. There is a basis {vi, v 2 } for the eigenspace corresponding 
to A = 3. In addition, there will be at least one eigenvector for A = 5 and one 
for A = —2. Call them V 3 and V 4 . Then {vi, v 2 , V 3 , V 4 } is linearly independent by 
Theorem 2 and Practice Problem 3 in Section 5.1. There can be no additional 
eigenvectors that are linearly independent from Vi , v 2 , V3 , V4 , because the vectors are 
all in M 4 . Hence the eigenspaces for A = 5 and A = —2 are both one-dimensional. 
It follows that A is diagonalizable by Theorem 7(b). 


and D 


1 

0 


0 

3 


5.4 EIGENVECTORS AND LINEAR TRANSFORMATIONS 


The goal of this section is to understand the matrix factorization A = PDP~ { as a 
statement about linear transformations. We shall see that the transformation x ^ Ax 
is essentially the same as the very simple mapping u ^Du, when viewed from the 
proper perspective. A similar interpretation will apply to A and D even when D is not 
a diagonal matrix. 

Recall from Section 1.9 that any linear transformation T from M 77 to M 777 can be 
implemented via left-multiplication by a matrix A, called the standard matrix of T . 
Now we need the same sort of representation for any linear transformation between two 
finite-dimensional vector spaces. 

















































5.4 Eigenvectors and Linear Transformations 291 


The Matrix of a Linear Transformation 


Let V be an n -dimensional vector space, let W be an m -dimensional vector space, and 
let T be any linear transformation from V to W. To associate a matrix with T, choose 
(ordered) bases B and C for V and W , respectively. 

Given any x in V, the coordinate vector [ x ] B is in M 77 and the coordinate vector of 

its image, [ T(x) ] c , is in R m , as shown in Figure 1. 


v T w 



FIGURE 1 A linear transformation from V to W. 


The connection between [ x ] B and [T(x)] c is easy to find. Let {bi,..., b n } be the 
basis B for V. If x = ribi + • • • + r n b n , then 


[xl 


B 


r\ 


n 


and 


T(x) = T(ribi H-h r n b„) = r { T( bi) +-h r n T( b„) 


0 ) 


because T is linear. Now, since the coordinate mapping from W to W n is linear 
(Theorem 8 in Section 4.4), equation (1) leads to 


[ m ] 


c 


ri[r(bO] c + 


+ r n[ T(b n ) ] 


c 


( 2 ) 


Since C-coordinate vectors are in M 777 , the vector equation (2) can be written as a matrix 
equation, namely, 





T 

We 


Multiplication 
by M 


T 

► [T(x)] c 


FIGURE 2 


[T(x)] c = M[x] b (3) 

where 

M = [[T( bi)] c [ T (b 2 ) ] c ••• [T( b„)] c ] (4) 

The matrix M is a matrix representation of T , called the matrix for T relative to the 
bases B and C. See Figure 2. 

Equation (3) says that, so far as coordinate vectors are concerned, the action of T 
on x may be viewed as left-multiplication by M . 


EXAMPLE 1 Suppose# = {bi,b 2 } is a basis for V and C = {ci, c 2 , C 3 } is a basis 
for W . Let T : V -> W be a linear transformation with the property that 

F(bi) = 3ci — 2 c 2 + 5c 3 and T( b 2 ) = 4c 1 + 7c 2 — c 3 


Find the matrix M for T relative to B and C . 
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SOLUTION The C-coordinate vectors of the images of bi and b 2 are 


Hence 



3 


4" 

[r(b,)] c = 

-2 

and [T(b 2 )] c = 

7 

5 

-1 


} i 



If B and C are bases for the same space V and if T is the identity transformation 
T(x) = x for x in V, then matrix M in (4) is just a change-of-coordinates matrix (see 
Section 4.7). 


Linear Transformations from V into V 


x---► T(x) 


In the common case where W is the same as V and the basis C is the same as B, the 
matrix M in (4) is called the matrix for T relative to B, or simply the ^-matrix for T, 
and is denoted by [ T ] B . See Figure 3. 

The ^-matrix for T : V —> V satisfies 


T 

[X] 



Multiplication 

by \t\ b 


T 

V [T(x)] b 


[ T(x) ] B = [ T ] B [x] t3 , for all x in V 



FIGURE 3 


EXAMPLE 2 The mapping T : P 2 P 2 defined by 


T(ao + a\t + a 2 t 



= a 1 + 2^ 


is a linear transformation. (Calculus students will recognize T as the differentiation 
operator.) 

a. Find the ^-matrix for T, when B is the basis {1 

b. Verify that [T(p)] B = [ T ] B [ p ] B for each p in P 2 . 

SOLUTION 

a. Compute the images of the basis vectors: 

T( 1) = 0 he zero polynomial 

T(t) = 1 he polynomial whose value is always 1 

T(t 2 ) = 21 

Then write the S-coordinate vectors of 7X1), T(t), and T(t 2 ) (which are found by 
inspection in this example) and place them together as the -matrix for T : 



0 


" 1 " 


0 

[T(l)] B = 

0 

. [T(t)] B = 

0 

. [T(t 2 )] B = 

2 

0 

0 

0 


* I f 



























5.4 


Eigenvectors and Linear Transformations 293 


WEB 


THEOREM 8 


b. For a general p(7) = ao + a\t + a 2 t 1 , 


[ n p) ] 


B 


[a\ + 2a 2 t ] 


B 


CL\ 

2a 2 

0 


0 1 0 


a 0 

0 0 2 


a\ 

0 0 0 


_d 2 _ 


[T’y p ] B 


See Figure 4. 


T 



FIGURE 4 Matrix representation of a linear 
transformation. 


Linear Transformations on R w 

In an applied problem involving W l , a linear transformation T usually appears first as 
a matrix transformation, x i-> v4x. If A is diagonalizable, then there is a basis B for W 1 
consisting of eigenvectors of A. Theorem 8 below shows that, in this case, the ^-matrix 
for T is diagonal. Diagonalizing A amounts to finding a diagonal matrix representation 
of x ^ Ax. 

Diagonal Matrix Representation 

Suppose A = PDP ~ l , where D is a diagonal n x n matrix. If B is the basis for 
R" formed from the columns of P , then D is the ^-matrix for the transformation 
x ^ Ax. 


PROOF Denote the columns of P by bi,..., b , 7 , so that B = {bi,..., b n } and P = 
[bi ••• b n ]. In this case, P is the change-of-coordinates matrix P& discussed in 
Section 4.4, where 

P[x] B = x and [x] B = P~ l x 


If T (x) = v4x for x in M 77 , then 


[T] b = [[T( bO] B ••• [T( b„)] B ] 

= [ [} B ••• [ Ah„ ] B ] 

= [P~'Abi ••• P-'Ab n ] 

= P~ l A[b l ••• b„] 

= P~ l AP 


Definition of [ T ] B 
Since T (x) = Ax 
Change of coordinates 
Matrix multiplication 


Since A = PDP 1 , we have [ T ] B = P l AP = D. 


(6) 
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EXAMPLE 3 Define T : M 2 -> M 2 by T(x) = Ax, where A 


1 

4 


2 

1 


. Find a 


basis B for M 2 with the property that the /3-matrix for T is a diagonal matrix. 


SOLUTION From Example 2 in Section 5.3, we know that 





A = 

0 " 

3 


PDP 1 , where 


The columns of P , call them bi and b 2 , are eigenvectors of A. By Theorem 8, D is the 
-matrix for T when B = {bi,b 2 }. The mappings x i->v4x and u describe the 

same linear transformation, relative to different bases. ■ 


Similarity of Matrix Representations 

The proof of Theorem 8 did not use the information that D was diagonal. Hence, 
if A is similar to a matrix C, with A = PCP~ l , then C is the /3-matrix for the 
transformation x i-> v4x when the basis B is formed from the columns of P . The 
factorization A = PCP~ l is shown in Figure 5. 



Multiplication 
byA 


VAx 

A 


Multiplication 
by P~ l 



Multiplication 

byC 


Multiplication 

byP 

V \Ax] b 


FIGURE 5 Similarity of two matrix representations: 
A = PCP~ l . 


Conversely, if T : 



n 


-> M 7? is defined by T(x) = v4x, and if B is any basis for 



1 , then the /3-matrix for T is similar to A. In fact, the calculations in the proof of 
Theorem 8 show that if P is the matrix whose columns come from the vectors in B, 
then [T]s = P~ l AP. Thus, the set of all matrices similar to a matrix A coincides with 
the set of all matrix representations of the transformation x i-> v4x. 


EXAMPLE 4 


Let A = 






, and b 2 




. The characteristic 


polynomial of A is (A + 2) 2 , but the eigenspace for the eigenvalue —2 is only one¬ 
dimensional; so A is not diagonalizable. However, the basis B = {bi,b 2 } has the 
property that the /3-matrix for the transformation x i-> Ax is a triangular matrix called 
the Jordan form of A . 1 Find this /3-matrix. 


SOLUTION If P = [ bi b 2 ], then the /3-matrix is P [ AP. Compute 


AP = 
P~ l AP = 



Notice that the eigenvalue of A is on the diagonal. 


1 Every square matrix A is similar to a matrix in Jordan form. The basis used to produce a Jordan form 

consists of eigenvectors and so-called “generalized eigenvectors” of A. See Chapter 9 of Applied Linear 
Algebra, 3rd ed. (Englewood Cliffs, NJ: Prentice-Hall, 1988), by B. Noble and J. W. Daniel. 
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NUMERICAL NOTE - 

An efficient way to compute a ^-matrix P~ l AP is to compute AP and then to row 
reduce the augmented matrix [ P AP ] to [ I P ~ l AP ] . A separate computation 
of P~ l is unnecessary. See Exercise 12 in Section 2.2. 


PRACTICE PROBLEMS 



Find T(ao + a\t + a 2 ^ 2 ), if T is the linear transformation from P 2 to P 2 whose 
matrix relative to B = {1, t, t 2 } is 




2. Let A, B , and C be n x n matrices. The text has shown that if A is similar to B , 
then B is similar to A . This property, together with the statements below, shows that 
“similar to” is an equivalence relation. (Row equivalence is another example of an 
equivalence relation.) Verify parts (a) and (b). 

a. A is similar to A . 

b. If A is similar to B and B is similar to C , then A is similar to C . 


5.4 EXERCISES 


1. Let B = {bi,b 2 ,b 3 } and V = {di,d 2 } be bases for vector 
spaces V and W , respectively. Let T : V -> W be a linear 
transformation with the property that 

T (bi) = 3di — 5d 2 , T(b 2 ) = —di + 6d 2 , T(b 3 ) = 4d? 

Find the matrix for T relative to B and T>. 


Find the matrix for T relative to B and the standard basis for 

E 2 . 

5. Let T : P 2 -> P 3 be the transformation that maps a polyno¬ 
mial p (t) into the polynomial (t + 5)p (t). 

a. Find the image of p (t) = 2 — t + t 2 . 


2. Let V = {di, d 2 } and B = {bi, b 2 } be bases for vector spaces 
V and W , respectively. Let T : V -> W be a linear transfor¬ 
mation with the property that 

T(di) = 2bi - 3b 2 , T(d 2 ) = -4bi + 5b 2 

Find the matrix for T relative to V and B. 


b. Show that T is a linear transformation. 

c. Find the matrix for T relative to the bases {1, t, t 2 } and 
{1, t, t 2 , t 3 }. 

6. Let T : P 2 -> P 4 be the transformation that maps a polyno¬ 
mial p (t) into the polynomial p (t) + ^ 2 p(0- 

a. Find the image of p (t) = 2 — t + t 2 . 


3. Let £^ = {ei,e 2 ,e 3 } be the standard basis for E 3 , 
F> = {bi,b 2 ,b 3 } be a basis for a vector space V, and 
T : E 3 —> V be a linear transformation with the property 
that 

T(xi,x 2 ,x 3 ) = (x 3 - x 2 )bi - (xi + x 3 )b 2 + (*i - x 2 )b 3 

a. Compute T(ei), T(e 2 ), and T(e 3 ). 

b. Compute [T(ei)] B , [T(e 2 )]^,and [T(e 3 )\ B . 


b. Show that T is a linear transformation. 

c. Find the matrix for T relative to the bases {1, t, t 2 } and 
{1 ,t,t 2 ,t\t 4 }. 

7. Assume the mapping T : P 2 -> P 2 defined by 

o o 

T(ao T ci\t T fl 2 c) — 3<2 0 T (5ao — 2 ciAt H- ( Aa\ -\- $ 2 )C 

is linear. Find the matrix representation of T relative to the 
basis B = {1, t, t 2 }. 


c. Find the matrix for T relative to S and B. 


4. Let B = {bi,b 2 ,b 3 } be a basis for a vector space V and 
T : V -> E 2 be a linear transformation with the Dronertv that 


T(x jbi + x 2 b 2 + x 3 b 3 ) — 


8. Let B = {bi,b 2 ,b 3 } be a basis for a vector space V. Find 
T(3bi — 4b 2 ) when T is a linear transformation from V to 
V whose matrix relative to B is 




"0 

-6 

1 " 

2x\ — 4 x 2 + 5x 3 

[T]b = 

0 

5 

-1 

— x 2 + 3x 3 

1 

-2 

7 
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9. Define T : P 2 -> E 3 by T (p) 


P(-1) 

P(0) 

P(l) 


a. Find the image under T of p (t) = 5 + 3 1. 

b. Show that T is a linear transformation. 


c. Find the matrix for T relative to the basis {l,t, t 2 } for P 2 
and the standard basis for E 3 . 


10. Define T : P 3 



IR 4 by 7"(p) 


Pt —3) 
P(-1) 

PCI) 

P(3) 


22. If A is diagonalizable and B is similar to A, then B is also 
diagonalizable. 

23. If B = P~ l AP and x is an eigenvector of A corresponding 
to an eigenvalue A, then P~ l x is an eigenvector of B corre¬ 
sponding also to A. 

24. If A and B are similar, then they have the same rank. [Hint: 
Refer to Supplementary Exercises 13 and 14 for Chapter 4.] 

25. The trace of a square matrix A is the sum of the diagonal 
entries in A and is denoted by tr +. It can be verified that 
tr(FG) = tr(GF) for any two n x n matrices F and G. 
Show that if A and B are similar, then tr A = tr B . 


a. Show that T is a linear transformation. 

b. Find the matrix for T relative to the basis {1 ,t,t 2 ,t 3 } for 
P 3 and the standard basis for E 4 . 


In Exercises 11 and 12, find the G-matrix for the transformation 
x i ^ Ax, when B = {bi, b 2 }. 


11 . A = 

12. A = 



In Exercises 13-16, define T : E 2 -> E 2 by T(x) = Ax. Find a 
basis B for E 2 with the property that [T]b is diagonal. 


13. A = 


15. A = 


1-1 

OJ o 

1 " 
4 

14. 

A 

— 

5 

_ —7 

-3" 

1 

4 

-1 

-2" 

3 _ 

16. 

A 

— 

2 

-1 

-6" 

3 _ 


17. Let A = 


1 

1 


1 

3 


and B = {bj,b 2 }, for bi = 


1 

1 


bi = 


5 

4 


. Define T : E 2 —> E 2 by T (x) = Ax 


a. Verify that bi is an eigenvector of A but A is not diago¬ 
nalizable. 

b. Find the G-matrix for T . 

18. Define T : E 3 -> E 3 by T(x) = Ax, where A is a 3 x 3 
matrix with eigenvalues 5 and —2. Does there exist a basis 
B for E 3 such that the G-matrix for T is a diagonal matrix? 
Discuss. 


Verify the statements in Exercises 19-24. The matrices are square. 

19. If A is invertible and similar to B , then B is invertible and 
A 1 is similar to B ~ l . [Hint: P~ l AP = B for some invert¬ 
ible P . Explain why B is invertible. Then find an invertible 



20. If A is similar to B, then A 2 is similar to B 2 . 

21. If B is similar to A and C is similar to A, then B is similar 
to C. 


26. It can be shown that the trace of a matrix A equals the sum of 
the eigenvalues of A. Verify this statement for the case when 
A is diagonalizable. 

27. Let V be E" with a basis B = {bi,..., b n }; let W be E" 
with the standard basis, denoted here by S ; and consider the 
identity transformation / : V -> W, where 7(x) = x. Find 
the matrix for I relative to B and S. What was this matrix 
called in Section 4.4? 

28. Let V be a vector space with a basis B = {bi,. .., b„}, W 
be the same space as V with a basis C = {ci,..., c„ }, and I 
be the identity transformation I : V -> W . Find the matrix 
for / relative to B and C. What was this matrix called in 
Section 4.7? 

29. Let V be a vector space with a basis B = {bi,..., b n }. Find 
the G-matrix for the identity transformation / : V -> V . 


[M] In Exercises 30 and 31, find the G-matrix for the transforma¬ 
tion x^ix when B = {bi, b 2 , b 3 }. 


30. A = 


14 

33 

11 


4 

9 

-4 


14 

31 

11 



1 

1_ 


1 

1_ 


1 

1_ 

b, = 

-2 

1 

b 2 = 

1 

_1 

b = 

1 

<N O 

_1 


31. A = 


7 

1 

3 


-48 

14 

-45 


16 

6 

19 


bi = 


~-3“ 


1 

(NJ 
_ 1 


1 

UJ 

_ 1 

1 

-3 

b 2 = 

1 

1 _ 

b = 

-1 

0_ 


32. [M] Let T be the transformation whose standard matrix is 
given below. Find a basis for E 4 with the property that [ T ] 
is diagonal. 



15 

-66 

-44 

-33 

0 

13 

21 

-15 

1 

-15 

-21 

12 

2 

-18 

-22 

8 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. Let p(7) = ao + a\t + a 2 t 2 and compute 


[ r(p) ] 


B 


[ T ] B [ p ] B 


"3 4 0" 


a 0 


3ao + 4<zi 

0 5-1 


a\ 

— 

5a\ — a 2 

1 -2 7 


_a 2 _ 


ao — 2a\ + 1 a 2 


So 7"(p) = (3<2 o + 4«i) + (5a\ — a 2 )t + (ao — 2a\ + lci 2 )t 


2. a. A = (/) , so A is similar to A. 

b. By hypothesis, there exist invertible matrices P and Q with the property that 
B = P~ l 2 AP and C = Q~ l BQ. Substitute the formula for B into the formula 
for C , and use a fact about the inverse of a product: 


c = Q~ l BQ = Q~ l (P~ l AP)Q = (PQ)~ l A(PQ) 


-1/ n—1 


-1 


This equation has the proper form to show that A is similar to C . 


5.5 COMPLEX EIGENVALUES 


Since the characteristic equation of an n x n matrix involves a polynomial of degree n , 
the equation always has exactly n roots, counting multiplicities, provided that possibly 
complex roots are included. This section shows that if the characteristic equation of 
a real matrix A has some complex roots, then these roots provide critical information 
about A. The key is to let A act on the space C 77 of n -tuples of complex numbers. 1 

Our interest in C 77 does not arise from a desire to “generalize” the results of the 
earlier chapters, although that would in fact open up significant new applications of 
linear algebra. 2 Rather, this study of complex eigenvalues is essential in order to uncover 
“hidden” information about certain matrices with real entries that arise in a variety of 
real-life problems. Such problems include many real dynamical systems that involve 
periodic motion, vibration, or some type of rotation in space. 

The matrix eigenvalue-eigenvector theory already developed for M 77 applies 
equally well to C 77 . So a complex scalar A satisfies det(v4 —XI) = 0 if and only if there 
is a nonzero vector x in C 77 such that v4x = Ax. We call A a (complex) eigenvalue and 
x a (complex) eigenvector corresponding to A. 


EXAMPLE 1 lfA = 



then the linear transformation x i->- v4x on M 2 


rotates the plane counterclockwise through a quarter-turn. The action of A is periodic, 
since after four quarter-turns, a vector is back where it started. Obviously, no nonzero 
vector is mapped into a multiple of itself, so A has no eigenvectors in M 2 and hence no 
real eigenvalues. In fact, the characteristic equation of A is 



1 Refer to Appendix B for a brief discussion of complex numbers. Matrix algebra and concepts about 
real vector spaces carry over to the case with complex entries and scalars. In particular, A(cx + dy) = 
cAx + dAy, for A an m x n matrix with complex entries, x, y in C n , and c, d in C. 

2 A second course in linear algebra often discusses such topics. They are of particular importance in 
electrical engineering. 
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The only roots are complex: A = i and A 
then 

r 


i . However, if we permit A to act on C 2 , 


1 

0 

-f 

— 1 

1 

0 

1- 

1 

0 

- 1 " 

— 1 

1 

0 

1- 


i 


1 


i 


i 

1 

i 

1 


i 


-1 

r 

• 


-1 



• 

r 11 

-l 

1 

_1 


Thus i and —i are eigenvalues, with 


1 


i 


and 


1 


i 


as corresponding eigenvectors 


(A method for finding complex eigenvectors is discussed in Example 2.) 


The main focus of this section will be on the matrix in the next example. 


EXAMPLE 2 Let A 

for each eigenspace. 


5 -.6 

75 1.1 


. Find the eigenvalues of A , and find a basis 


SOLUTION The characteristic equation of A is 


0 = det 


.5 — A —.6 
.75 1.1-A 


(.5 - A)(l.l - A) - (—.6)(.75) 
A 2 - 1.6A + 1 


From the quadratic formula, A 
value A = .8 — .6/, construct 


^ [1.6 ± y(—1.6) 2 — 4] = .8 =b .6/. For the eigen- 


A — (.8 — . 6 /)/ 


.5 

—.6 


so 

• 

00 

• 

1_ 

0 

75 

1.1 


0 

— 1 

so 

• 

00 

• 


.3 T .6/ 

.15 


—.6 

.3 T - .6/ 


( 1 ) 


Row reduction of the usual augmented matrix is quite unpleasant by hand because of the 
complex arithmetic. However, here is a nice observation that really simplifies matters: 
Since .8 — .6i is an eigenvalue, the system 


(—.3 + .6z)xi — .6x2 = 0 

.75xi T (.3 T .6/)x2 — 0 



has a nontrivial solution (with xi and X 2 possibly complex numbers). Therefore, both 
equations in (2) determine the same relationship between X\ and X 2 , and either equation 
can be used to express one variable in terms of the other? 

The second equation in (2) leads to 

.75xi = (—-3 — .6/)x 2 
xi = (—.4 — .8f)x2 


Choose X 2 = 5 to eliminate the decimals, and obtain xi — —2 — 4/. A basis for the 
eigenspace corresponding to A = .8 — .6i is 



-2 - M 
5 


3 Another way to see this is to realize that the matrix in equation (1) is not invertible, so its rows are linearly 
dependent (as vectors in C 2 ), and hence one row is a (complex) multiple of the other. 
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Analogous calculations for A = .8 + . 6 / produce the eigenvector 



-2 + 4 i 
5 


As a check on the work, compute 



".5 

—.6 


" -2 + M " 


" -4 + 2 i " 

.75 

1.1 


5 


4 + 3/ 


= (.8 + . 6 /)v 2 ■ 


Surprisingly, the matrix A in Example 2 determines a transformation x i-> Ax that 
is essentially a rotation. This fact becomes evident when appropriate points are plotted. 


EXAMPLE 3 One way to see how multiplication by the matrix A in Example 2 
affects points is to plot an arbitrary initial point—say, x 0 = ( 2 , 0 ) —and then to plot 
successive images of this point under repeated multiplications by A. That is, plot 



Figure 1 shows Xo,..., xg as larger dots. The smaller dots are the locations of X 9 ,..., 
X 100 . The sequence lies along an elliptical orbit. ■ 


. ^ 



FIGURE 1 Iterates of a point x 0 
under the action of a matrix with a 
complex eigenvalue. 

Of course, Figure 1 does not explain why the rotation occurs. The secret to the 
rotation is hidden in the real and imaginary parts of a complex eigenvector. 


Real and Imaginary Parts of Vectors 

The complex conjugate of a complex vector x in C 77 is the vector x in C 77 whose entries 
are the complex conjugates of the entries in x. The real and imaginary parts of a 
complex vector x are the vectors Re x and Im x in M 77 formed from the real and imaginary 
parts of the entries of x. 



























300 CHAPTER 5 Eigenvalues and Eigenvectors 



3 — i 


3 


"-I" 


EXAMPLE 4 Ifx = 

• 

i 

2 + 5/ 


0 

2 

+ i 

1 

5 

, then 


Rex = 

3 

0 

, I m x = 

"-I" 

1 

, and x = 

3 

0 

• 

— i 

"-I" 

1 


3 + i 
• 

—i 

■ 


2 


5 


2 


5 


2 — 5/ 



If B is an m x n matrix with possibly complex entries, then B denotes the matrix 
whose entries are the complex conjugates of the entries in B . Properties of conjugates 
for complex numbers carry over to complex matrix algebra: 

rx = rx, Bx = B x, BC = B C , and rB = r B 

Eigenvalues and Eigenvectors of a Real Matrix 
That Acts on C" 

Let A be an n x n matrix whose entries are real. Then Ax = Ax = Ax. If A is an 
eigenvalue of A and x is a corresponding eigenvector in C 11 , then 

Ax = Ax = Ax = Ax 

Hence A is also an eigenvalue of A , with x a corresponding eigenvector. This shows that 
when A is real, its complex eigenvalues occur in conjugate pairs. (Here and elsewhere, 
we use the term complex eigenvalue to refer to an eigenvalue A = a + hi , with b ^ 0.) 

EXAM PLE 5 The eigenvalues of the real matrix in Example 2 are complex conju¬ 
gates, namely, .8 — .6/ and .8 + .6/. The corresponding eigenvectors found in Exam¬ 
ple 2 are also conjugates: 



-2 - M 
5 




-2 + 4 i 
5 



The next example provides the basic “building block” for all real 2x2 matrices 
with complex eigenvalues. 



FIGURE 2 


EXAMPLE 6 IfC 


, where a and b are real and not both zero, then the 


a —b 
b a 

eigenvalues of C are A = a ±bi . (See the Practice Problem at the end of this section.) 


Also, if r = |A 


\J a 1 + b 2 , then 





cos cp 
sin<^ 


— sin<^ 

COS (p 



where (p is the angle between the positive x-axis and the ray from (0, 0) through (a, b). 
See Figure 2 and Appendix B. The angle cp is called the argument of A = a + bi. Thus 
the transformation x ^Cx may be viewed as the composition of a rotation through the 
angle cp and a scaling by | A | (see Figure 3). ■ 


Finally, we are ready to uncover the rotation that is hidden within a real matrix 
having a complex eigenvalue. 
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FIGURE 3 A rotation followed by a 
scaling. 


EXAMPLE 7 Let A 


5 


.6 


75 1.1 


, A = .8 — .6/, and Vi 


Example 2. Also, let P be the 2x2 real matrix 


P = [ Re Vi Im vi ] 


2 -4 
5 0 


and let 


C = P~ l AP 


1 


20 


0 4 
5 -2 


5 


.6 


75 1.1 


By Example 6, C is a pure rotation because 
C = P~ l AP, we obtain 


A 


A = PCP 


-l 


P 


.8 

.6 


6 

8 


2 -4 
5 0 


2 - 4 i 
5 


, as in 


.8 

.6 


.6 

.8 


(.8)“ + (.6) 


1. From 


P 


-l 


Here is the rotation “inside” A\ The matrix P provides a change of variable, say, 
x = Pu. The action of A amounts to a change of variable from x to u, followed by 
a rotation, and then a return to the original variable. See Figure 4. The rotation produces 
an ellipse, as in Figure 1, instead of a circle, because the coordinate system determined 
by the columns of P is not rectangular and does not have equal unit lengths on the two 
axes. ■ 



Change of 
variable 



▼ 

u 



C 

Rotation 


>► Ax 

A 



Change of 
variable 


A Cu 


FIGURE 4 Rotation due to a complex eigenvalue. 


The next theorem shows that the calculations in Example 7 can be carried out for 
any 2x2 real matrix A having a complex eigenvalue A. The proof uses the fact that if 
the entries in A are real, then A (Rex) = Re(Ax) and A(Imx) = Im(Ax), and if x is an 
eigenvector for a complex eigenvalue, then Rex and Imx are linearly independent in 
R 2 . (See Exercises 25 and 26.) The details are omitted. 


THEOREM 9 


Let A be a real 2x2 matrix with a complex eigenvalue A = a — bi (b ^ 0) and 
an associated eigenvector v in C 2 . Then 

A =PCP~ l , where P = [Rev Imv] and C = 


a —b 
b a 
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FIGURE 5 

Iterates of two points under the 
action of a 3 x 3 matrix with a 
complex eigenvalue. 


The phenomenon displayed in Example 7 persists in higher dimensions. For 
instance, if A is a 3 x 3 matrix with a complex eigenvalue, then there is a plane in 
M 3 on which A acts as a rotation (possibly combined with scaling). Every vector in that 
plane is rotated into another point on the same plane. We say that the plane is invariant 
under A. 


EXAMPLE 8 The matrix A = 




has eigenvalues .8 d= . 6 / and 


1.07. Any vector wo in the x 1 X 2 -plane (with third coordinate 0) is rotated by A into 
another point in the plane. Any vector xo not in the plane has its X 3 -coordinate multiplied 
by 1.07. The iterates of the points Wo = (2, 0,0) and x 0 = (2,0,1) under multiplication 
by A are shown in Figure 5. ■ 


PRACTICE PROBLEM 


Show that if a and b are real, then the eigenvalues of A = 


corresponding eigenvectors 





are a d= hi , with 


5.5 EXERCISES 


Let each matrix in Exercises 1-6 act on C 2 . Find the eigenvalues 
and a basis for each eigenspace in C 2 . 






1 -.8 
4 -2.2 



3. 

1 

-2 

5" 

3 _ 

4 . 

"5 -2" 
_ 1 3 _ 

5. 

0 

-8 

r 

4 

6. 

4 3 

-3 4 


In Exercises 7-12, use Example 6 to list the eigenvalues of A. 
In each case, the transformation x i-> Ax is the composition of 
a rotation and a scaling. Give the angle (p of the rotation, where 
—71 < <p < 7i, and give the scale factor r. 



21. In Example 2, solve the first equation in (2) for X 2 in terms of 


X \, and from that produce the eigenvector y = 


2 

-1 +2 i 


for the matrix A . Show that this y is a (complex) multiple of 
the vector Vi used in Example 2. 


7. 

1 1 

1_1 


8. 

" v7 3' 
_ -3 V3_ 

9. 

CN CN 

§7 

1_1 

1 / 2 ' 

—V3/2_ 

10. 

' -5 -5' 
5 -5 

11. 

.1 .1“ 

-.1 .1 


12. 

1 

cn O 
• 

O cn 
• 

1_ 


22. Let A be a complex (or real) n x n matrix, and let x in C" be 
an eigenvector corresponding to an eigenvalue A in C. Show 
that for each nonzero complex scalar /z, the vector fix is an 
eigenvector of A . 

Chapter 7 will focus on matrices A with the property that A T — A. 
Exercises 23 and 24 show that every eigenvalue of such a matrix 
is necessarily real. 


C of the form 


such that the given matrix has the 


In Exercises 13-20, find an invertible matrix P and a matrix 

a —b 

b a 

form A = PCP ~ l . For Exercises 13-16, use information from 
Exercises 1-4. 



23. Let A be an yi xn real matrix with the property that A T = A , 

_'T 

let x be any vector in C n , and let q = x Ax. The equalities 
below show that q is a real number by verifying that q = q. 
Give a reason for each step. 

q = x^x = x^x = x r dx = (x T Ax) T = x r A T x = q 

(a) (b) (c) (d) (e) 


13. 



































































5.6 Discrete Dynamical Systems 303 


24. Let A be an n x n real matrix with the property that A T = A. 
Show that if Ax = Ax for some nonzero vector x in C n , then, 
in fact, A is real and the real part of x is an eigenvector of A. 

—T 

[Hint: Compute x Ax, and use Exercise 23. Also, examine 
the real and imaginary parts of Ax.] 

25. Let A be a real n xn matrix, and let x be a vector in C n . Show 
that Re (Ax) = A (Rex) andlm(Ax) = A(Imx). 

26. Let A be a real 2x2 matrix with a complex eigenvalue 
A = a — bi (b ^ 0) and an associated eigenvector v in C 2 . 

a. Show that A (Re v) = a Re v + b Im v and A(Imv) = 
—b Re v + a Imv. [Hint: Write v = Re v + i Imv, and 
compute Ay.] 

b. Verify that if P and C are given as in Theorem 9, then 
AP = PC. 


[M] In Exercises 27 and 28, find a factorization of the given 
matrix A in the form A = PCP~ l , where C is a block-diagonal 
matrix with 2x2 blocks of the form shown in Example 6. (For 
each conjugate pair of eigenvalues, use the real and imaginary 
parts of one eigenvector in C 4 to create two columns of P .) 


27. 


28. 


.7 

1.1 

2.0 

1.7 

-2.0 

-4.0 

-8.6 

-7.4 

0 

-.5 

-1.0 

-1.0 

1.0 

2.8 

6.0 

5.3 

-1.4 

-2.0 

-2.0 

-2.0 

-1.3 

-.8 

-.1 

—.6 

.3 

-1.9 

-1.6 

-1.4 

2.0 

3.3 

2.3 

2.6 


SOLUTION TO PRACTICE PROBLEM 

Remember that it is easy to test whether a vector is an eigenvector. There is no need to 
examine the characteristic equation. Compute 



a 

-b' 


r 


a + bi 

b 

a 


• 

—i 


b — ai 



(a + bi) 




Thus 


1 


i 


is an eigenvector corresponding to A = a + bi . From the discussion in this 


section, 


1 


i 


must be an eigenvector corresponding to A — a — bi. 


5.6 DISCRETE DYNAMICAL SYSTEMS 


Eigenvalues and eigenvectors provide the key to understanding the long-term behavior, 
or evolution , of a dynamical system described by a difference equation x^+i = Ax^. 
Such an equation was used to model population movement in Section 1.10, various 
Markov chains in Section 4.9, and the spotted owl population in the introductory 
example for this chapter. The vectors x^ give information about the system as time 
(denoted by k) passes. In the spotted owl example, for instance, x^ listed the numbers 
of owls in three age classes at time k. 

The applications in this section focus on ecological problems because they are easier 
to state and explain than, say, problems in physics or engineering. However, dynamical 
systems arise in many scientific fields. For instance, standard undergraduate courses 
in control systems discuss several aspects of dynamical systems. The modern state- 
space design method in such courses relies heavily on matrix algebra. 1 The steady-state 
response of a control system is the engineering equivalent of what we call here the 
“long-term behavior” of the dynamical system x^ + i = Ax^. 


1 See G. F. Franklin, J. D. Powell, and A. Emami-Naeimi, Feedback Control of Dynamic Systems, 5th ed. 
(Upper Saddle River, NJ: Prentice-Hall, 2006). This undergraduate text has a nice introduction to dynamic 
models (Chapter 2). State-space design is covered in Chapters 7 and 8. 
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Until Example 6, we assume that A is diagonalizable, with n linearly indepen¬ 
dent eigenvectors, ,\ n , and corresponding eigenvalues, X\,..., X n . For conve¬ 

nience, assume the eigenvectors are arranged so that |A 1 1 > |A 2 1 > ••• > \X n \. Since 
{vi,..., \ n } is a basis for M 7? , any initial vector x 0 can be written uniquely as 

x 0 = c \ y \ H-b c n y n (1) 

This eigenvector decomposition of xo determines what happens to the sequence {x^}. 
The next calculation generalizes the simple case examined in Example 5 of Section 5.2. 
Since the V/ are eigenvectors, 

xi = Ax 0 = ciA\i H-h c n A\ n 

= C 1 A 1 V 1 + • • • + c n X n \ n 


In general, 

Xk = ci(Ai) fc vi H-f c n (X n ) k \ n (k = 0,1,2,...) (2) 

The examples that follow illustrate what can happen in (2) as k -> oo. 

A Predator-Prey System 

Deep in the redwood forests of California, dusky-footed wood rats provide up to 80% of 
the diet for the spotted owl, the main predator of the wood rat. Example 1 uses a linear 
dynamical system to model the physical system of the owls and the rats. (Admittedly, 
the model is unrealistic in several respects, but it can provide a starting point for the 
study of more complicated nonlinear models used by environmental scientists.) 


EXAMPLE 1 


Denote the owl and wood rat populations at time k by x^ 


O k 

R k 


where k is the time in months, Ok is the number of owls in the region studied, and Rk 
is the number of rats (measured in thousands). Suppose 


Ok +i — (-5) Ok + (A)Rk 

Rk +i — ~P % Ok + (l.l)i?jt 



where p is a positive parameter to be specified. The (.5)6^ in the first equation says 
that with no wood rats for food, only half of the owls will survive each month, while the 
(1.1)^ in the second equation says that with no owls as predators, the rat population 
will grow by 10% per month. If rats are plentiful, the (A)Rk will tend to make the 
owl population rise, while the negative term —p-Ok measures the deaths of rats due 
to predation by owls. (In fact, 1000/7 is the average number of rats eaten by one owl in 
one month.) Determine the evolution of this system when the predation parameter p is 
.104. 

SOLUTION When p = .104, the eigenvalues of the coefficient matrix A for the equa¬ 
tions in (3) turn out to be X \ = 1.02 and X 2 = .58. Corresponding eigenvectors are 



1 

0 

1_ 


r 51 

Vi = 

13 

, v 2 = 

_ 1 


An initial xq can be written as xq = cqvi + C 2 V 2 . Then, for k > 0, 


x k = Ci(1.02)^Vi + C2(.58 ) k v 2 


k 


Cl (1.02) 


k 


10 

13 


+ C2(.58) 


k 


5 

1 













5.6 Discrete Dynamical Systems 305 


As k -> oo, (.58)^ rapidly approaches zero. Assume C\ > 0. Then, for all sufficiently 
large k,x k is approximately the same as c i(1.02) /r vi, and we write 


x k « ci (1.02) 


k 


10 

13 


(4) 


The approximation in (4) improves as k increases, and so for large k , 


X£+1 « ci (1.02) 


k~\~ 1 


10 

13 


(1.02)ci(1.02) 


k 


10 

13 




1.02x^ 


(5) 


The approximation in (5) says that eventually both entries of x k (the numbers of owls 
and rats) grow by a factor of almost 1.02 each month, a 2% monthly growth rate. By 
(4), Xfc is approximately a multiple of (10, 13), so the entries in x k are nearly in the same 
ratio as 10 to 13. That is, for every 10 owls there are about 13 thousand rats. ■ 


Example 1 illustrates two general facts about a dynamical system x^+i = Ax k in 
which A is n xn, its eigenvalues satisfy | X\ | > 1 and 1 > | Xj | for j = 2,..., n , and Vi 
is an eigenvector corresponding to X\ . If xo is given by equation (1), with ci ^ 0, then 
for all sufficiently large k , 

Xjfc +1 At x k (6) 

and 

x k ss ci(Ai)*Vi (7) 


The approximations in (6) and (7) can be made as close as desired by taking k 
sufficiently large. By (6), the x k eventually grow almost by a factor of X\ each time, 
so X\ determines the eventual growth rate of the system. Also, by (7), the ratio of any 
two entries in x k (for large k) is nearly the same as the ratio of the corresponding entries 
in Vi. The case in which X\ = 1 is illustrated in Example 5 in Section 5.2. 


Graphical Description of Solutions 

When A is 2 x 2, algebraic calculations can be supplemented by a geometric description 
of a system’s evolution. We can view the equation x k +\ = Ax k as a description of what 
happens to an initial point x 0 in R 2 as it is transformed repeatedly by the mapping 
x i-^ Ax. The graph of x 0 , Xi,... is called a trajectory of the dynamical system. 


EXAM PLE 2 Plot several trajectories of the dynamical system x^+i = Ax k , when 



SOLUTION The eigenvalues of A are .8 and .64, with eigenvectors Vi 


1 

0 


and 


v 2 


0 

1 


. If xo = c ivi + c 2 V 2 , then 




+ c 2 (M) k 



Of course, x k tends to 0 because (.8)* and (.64) /f both approach 0 as k —> oo. But the way 
x k goes toward 0 is interesting. Figure 1 shows the first few terms of several trajectories 
that begin at points on the boundary of the box with corners at (±3, ±3). The points on 
each trajectory are connected by a thin curve, to make the trajectory easier to see. ■ 
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FIGURE 1 The origin as an attractor. 


In Example 2, the origin is called an attractor of the dynamical system because 
all trajectories tend toward 0. This occurs whenever both eigenvalues are less than 1 
in magnitude. The direction of greatest attraction is along the line through 0 and the 
eigenvector \2 for the eigenvalue of smaller magnitude. 

In the next example, both eigenvalues of A are larger than 1 in magnitude, and 0 
is called a repeller of the dynamical system. All solutions of x^+i = Axj^ except the 
(constant) zero solution are unbounded and tend away from the origin. 2 



EXAMPLE 3 


Plot several typical solutions of the equation x^+i = Axk , where 



2 The origin is the only possible attractor or repeller in a linear dynamical system, but there can be multiple 
attractors and repellers in a more general dynamical system for which the mapping x&+i is not linear. 

In such a system, attractors and repellers are defined in terms of the eigenvalues of a special matrix (with 
variable entries) called the Jacobian matrix of the system. 
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SOLUTION The eigenvalues of A are 1.44 and 1.2. If x 0 = 


c 1 

<?2 


, then 




Cl (1.44)* 



+ £2(1. 2 /' 



Both terms grow in size, but the first term grows faster. So the direction of greatest re¬ 
pulsion is the line through 0 and the eigenvector for the eigenvalue of larger magnitude. 
Figure 2 shows several trajectories that begin at points quite close to 0. ■ 


In the next example, 0 is called a saddle point because the origin attracts solutions 
from some directions and repels them in other directions. This occurs whenever one 
eigenvalue is greater than 1 in magnitude and the other is less than 1 in magnitude. The 
direction of greatest attraction is determined by an eigenvector for the eigenvalue of 
smaller magnitude. The direction of greatest repulsion is determined by an eigenvector 
for the eigenvalue of greater magnitude. 


EXAM PLE 4 Plot several typical solutions of the equation y k + x = Dy k , where 



(We write D and y here instead of A and x because this example will be used later.) 
Show that a solution {y^} is unbounded if its initial point is not on the X 2 -axis. 


SOLUTION The eigenvalues of D are 2 and .5. If y 0 





+ c 2 (.5) k 




tfy 0 is on the X 2 -axis, then c\ = 0 and y^ —>► 0 as k -> 00 . But if y 0 is not on the X 2 -axis, 
then the first term in the sum for y k becomes arbitrarily large, and so {y^} is unbounded. 
Figure 3 shows ten trajectories that begin near or on the X 2 -axis. ■ 





1 


FIGURE 3 The origin as a saddle point. 
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Eigenvalues and Eigenvectors 


Change of Variable 

The preceding three examples involved diagonal matrices. To handle the nondiagonal 
case, we return for a moment to the n x n case in which eigenvectors of A form a 
basis {vi,..., v„} for E 77 . Let P = [ Vi • • • \ n ], and let D be the diagonal matrix 
with the corresponding eigenvalues on the diagonal. Given a sequence {x^} satisfying 
x^+i = Ax k , define a new sequence {y^} by 

y k = P~ l x k , or equivalently, x k = Py k 

Substituting these relations into the equation x^ +i = Ax k and using the fact that A = 
PDP ~ 1 , we find that 

^+i = APy k = (PDP~ l )Py k = PDy k 
Left-multiplying both sides by P -1 , we obtain 

y k +i = Dy k 

If we write y k as y(k) and denote the entries in y(k) by y\(k),..., y n (k), then 


)’\(k + 1) 


A i 0 • • • 0 


y\(k) 

yi(k + 1) 

• 

• 

• 

— 

0 A 2 : 

: 0 


yi{k) 

• 

• 

• 

}’n (k + 1)_ 


0 • • • 0 X n 


_ y« (k ) _ 


The change of variable from x k to y k has decoupled the system of difference equations. 
The evolution of y\ (k ), for example, is unaffected by what happens to y 2 (k ),..., y n (k ), 
because y\(k + 1) = X\ • yi(k) for each k. 

The equation x k = p y k says that y k is the coordinate vector of x k with respect to 
the eigenvector basis {vi,..., \ n }. We can decouple the system x k +\ = Ax k by making 
calculations in the new eigenvector coordinate system. When n — 2, this amounts to 
using graph paper with axes in the directions of the two eigenvectors. 


EXAMPLE 5 

where 


Show that the origin is a saddle point for solutions of x&+i = Ax k , 


1.25 -.75 

-.75 1.25 


Find the directions of greatest attraction and greatest repulsion. 

SOLUTION Using standard techniques, we find that A has eigenvalues 2 and .5, with 


corresponding eigenvectors Vi 


1 

1 


and \2 


1 

1 


respectively. Since |2| > land 


5| < 1, the origin is a saddle point of the dynamical system. If xq = C\\\ + cpti, then 


x k = c\2 k \\ + c 2 (.5) k v 2 


k 


(9) 


This equation looks just like equation (8) in Example 4, with Vi and \2 in place of the 
standard basis. 

On graph paper, draw axes through 0 and the eigenvectors Vi and \ 2 . See Figure 4. 
Movement along these axes corresponds to movement along the standard axes in 
Figure 3. In Figure 4, the direction of greatest repulsion is the line through 0 and the 
eigenvector Vi whose eigenvalue is greater than 1 in magnitude. If x 0 is on this line, the 
C 2 in (9) is zero and x k moves quickly away from 0. The direction of greatest attraction 
is determined by the eigenvector V 2 whose eigenvalue is less than 1 in magnitude. 

A number of trajectories are shown in Figure 4. When this graph is viewed in 
terms of the eigenvector axes, the picture “looks” essentially the same as the picture 
in Figure 3. ■ 
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FIGURE 4 The origin as a saddle point. 



Complex Eigenvalues 

When a real 2x2 matrix A has complex eigenvalues, A is not diagonalizable (when 
acting on M 2 ), but the dynamical system x^+i = v4x^ is easy to describe. Example 3 
of Section 5.5 illustrated the case in which the eigenvalues have absolute value 1. The 
iterates of a point xo spiraled around the origin along an elliptical trajectory. 

If A has two complex eigenvalues whose absolute value is greater than 1, then 0 is 
a repeller and iterates of x 0 will spiral outward around the origin. If the absolute values 
of the complex eigenvalues are less than 1, then the origin is an attractor and the iterates 
of x 0 spiral inward toward the origin, as in the following example. 


EXAMPLE 6 It can be verified that the matrix 




has eigenvalues .9 =b .2/, with eigenvectors 
of the system x^ + i = Axj ^, with initial vectors 



. Figure 5 shows three trajectories 



Survival of the Spotted Owls 

Recall from this chapter’s introductory example that the spotted owl population in the 
Willow Creek area of California was modeled by a dynamical system x^ + i = v4x^ in 
which the entries in x^ = (jk,Sk,dk) listed the numbers of females (at time k) in the 
juvenile, subadult, and adult life stages, respectively, and A is the stage-matrix 
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FIGURE 5 Rotation associated with complex 
eigenvalues. 


MATLAB shows that the eigenvalues of A are approximately Ai = .98, 
A 2 = —.02 + .21/, and A 3 = —.02 — .21/. Observe that all three eigenvalues are 
less than 1 in magnitude, because |A 2 1 2 = |A 3 1 2 = (—.02 ) 2 + (.21 ) 2 = .0445. 

For the moment, let A act on the complex vector space C 3 . Then, because A has 
three distinct eigenvalues, the three corresponding eigenvectors are linearly independent 
and form a basis for C 3 . Denote the eigenvectors by Vi, V 2 , and V 3 . Then the general 
solution of X£ + i = Ax£ (using vectors in C 3 ) has the form 

x k = a(Ai) k vi + c 2 (X 2 ) k v 2 + c 3 (X 3 ) k v 3 (11) 

If xo is a real initial vector, then xi = Ax o is real because A is real. Similarly, the 
equation x^ + i = Axj^ shows that each x^ on the left side of ( 11 ) is real, even though 
it is expressed as a sum of complex vectors. However, each term on the right side 
of ( 11 ) is approaching the zero vector, because the eigenvalues are all less than 1 in 
magnitude. Therefore the real sequence x^ approaches the zero vector, too. Sadly, this 
model predicts that the spotted owls will eventually all perish. 

Is there hope for the spotted owl? Recall from the introductory example that the 
18% entry in the matrix A in (10) comes from the fact that although 60% of the juvenile 
owls live long enough to leave the nest and search for new home territories, only 30% 
of that group survive the search and find new home ranges. Search survival is strongly 
influenced by the number of clear-cut areas in the forest, which make the search more 
difficult and dangerous. 

Some owl populations live in areas with few or no clear-cut areas. It may be that 
a larger percentage of the juvenile owls there survive and find new home ranges. Of 
course, the problem of the spotted owl is more complex than we have described, but the 
final example provides a happy ending to the story. 

EXAMPLE 7 Suppose the search survival rate of the juvenile owls is 50%, so the 
(2,1)-entry in the stage-matrix A in (10) is .3 instead of .18. What does the stage-matrix 
model predict about this spotted owl population? 

SOLUTION Now the eigenvalues of A turn out to be approximately Ai = 1.01, A 2 = 
—.03 + .26/, and A 3 = —.03 — .26/. An eigenvector for Ai is approximately Vi = 
(10,3,31). Let V 2 and V 3 be (complex) eigenvectors for A 2 and A 3 . In this case, 
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equation (11) becomes 

Xk = ci(1.01)^Vi + C 2 (—.03 + .26i) k \2 + c 3 (—.03 — .26z)^v 3 

As k -> oo, the second two vectors tend to zero. So becomes more and more like 
the (real) vector Ci(1.01)^Vi. The approximations in equations (6) and (7), following 
Example 1, apply here. Also, it can be shown that the constant C\ in the initial 
decomposition of xo is positive when the entries in xo are nonnegative. Thus the owl 
population will grow slowly, with a long-term growth rate of 1.01. The eigenvector Vi 
describes the eventual distribution of the owls by life stages: for every 31 adults, there 
will be about 10 juveniles and 3 subadults. ■ 

Further Reading 

Franklin, G. F., J. D. Powell, and M. F. Workman. Digital Control of Dynamic Systems , 
3rd ed. Reading, MA: Addison-Wesley, 1998. 

Sandefur, James T. Discrete Dynamical Systems—Theory and Applications. Oxford: 
Oxford University Press, 1990. 

Tuchinsky, Philip. Management of a Buffalo Herd , UMAP Module 207. Fexington, MA: 
COMAP, 1980. 


PRACTICE PROBLEMS 


O 1 

1. The matrix A below has eigenvalues 1, and with corresponding eigenvectors 
vi, v 2 ,and v 3 : 


1 

A = - 
9 

i_ 

-2 

i 

o 


i 

<N 

1 _ 


i 

<N 

1 _ 


1 - 

1 _ 

-2 

6 

2 

, Vi = 

2 

, v 2 = 

1 

, v 3 = 

2 

i 

o 

2 

5 


i 

_i 


1 

<N 

_1 


—i 

<N 

_1 


Find the general solution of the equation x^ +i = Ax k if x 0 = 
2. What happens to the sequence {x^} in Practice Problem 1 as k 



5.6 EXERCISES 


1. Let A be a 2 x 2 matrix with eigenvalues 3 and 1/3 and 


2. Suppose the eigenvalues of a 3 x 3 matrix A are 3, 4/5, and 


corresponding eigenvectors Vi = 


1 

1 


andv 2 = 


1 

1 


. Let 


{x^} be a solution of the difference equation x^+i = Ax k , 

9 


x 0 = 


1 


itself.] 

b. Find a formula for x^ involving k and the eigenvectors Vi 
and v 2 . 


3/5, with corresponding eigenvectors 


1 " 


1 

<N 

1 _ 

0 

5 

1 

-3 


1 

L/l 

1 _ 


, and 



~-3“ 


-2 

Xi = Tx 0 . [Hint: You do not need to know A 

-3 

7 

. Let x 0 = 

i 

1_ 


. Find the solution of the equation 


x k +i = Ax k for the specified x 0 , and describe what happens 
as k -> oo. 































312 CHAPTER 5 Eigenvalues and Eigenvectors 


In Exercises 3-6, assume that any initial vector x 0 has an eigen¬ 
vector decomposition such that the coefficient c\ in equation (1) 
of this section is positive. 3 


3. Determine the evolution of the dynamical system in Exam¬ 
ple 1 when the predation parameter p is .2 in equation (3). 
(Give a formula for x*.) Does the owl population grow or 
decline? What about the wood rat population? 

4. Determine the evolution of the dynamical system in Example 
1 when the predation parameter p is .125. (Give a formula 
for Xfc.) As time passes, what happens to the sizes of the owl 
and wood rat populations? The system tends toward what is 
sometimes called an unstable equilibrium. What do you think 
might happen to the system if some aspect of the model (such 
as birth rates or the predation rate) were to change slightly? 



In old-growth forests of Douglas fir, the spotted owl dines 
mainly on flying squirrels. Suppose the predator-prey matrix 


for these two populations is A = 




. Show that 


if the predation parameter p is .325, both populations grow. 
Estimate the long-term growth rate and the eventual ratio of 
owls to flying squirrels. 


6 . Show that if the predation parameter p in Exercise 5 is .5, 
both the owls and the squirrels will eventually perish. Find a 
value of p for which populations of both owls and squirrels 
tend toward constant levels. What are the relative population 
sizes in this case? 


7. Let A have the properties described in Exercise 1. 

a. Is the origin an attractor, a repeller, or a saddle point of 
the dynamical system Xk+\ = Ax^l 

b. Find the directions of greatest attraction and/or repulsion 
for this dynamical system. 

c. Make a graphical description of the system, showing 
the directions of greatest attraction or repulsion. Include 
a rough sketch of several typical trajectories (without 
computing specific points). 

8. Determine the nature of the origin (attractor, repeller, or 
saddle point) for the dynamical system x^+i = Axk if A has 
the properties described in Exercise 2. Find the directions of 
greatest attraction or repulsion. 


In Exercises 9-14, classify the origin as an attractor, repeller, 
or saddle point of the dynamical system x^+i = Ax^. Find the 
directions of greatest attraction and/or repulsion. 


9. A = 



10. A = 




3 One of the limitations of the model in Example 1 is that there always exist 
initial population vectors xo with positive entries such that the coefficient 
c\ is negative. The approximation (7) is still valid, but the entries in x^ 
eventually become negative. 


11. 

A = 

.4 

-.4 

.5" 

L3_ 

12. A = 

.5 
_ —.3 

.6 

1.4 

13. 

A = 

OO Tj- 
• • 

1_1 

.3" 

L5_ 

14. A = 

" 1.7 
-.4 

1-1 

so t> 

• • 



".4 

0 .2" 


.1 


15. Let A — 

.3 

.3 

.8 .3 

.2 .5 

. The vector Vi = 

.6 

_ .3 _ 

is an 


eigenvector for A, and two eigenvalues are .5 and .2. Con¬ 
struct the solution of the dynamical system x^+i = Axk that 
satisfies x 0 = (0, .3, .7). What happens to x^ as k -> oo? 


16. [M] Produce the general solution of the dynamical system 
Xk+i = Axk when A is the stochastic matrix for the Hertz 
Rent A Car model in Exercise 16 of Section 4.9. 

17. Construct a stage-matrix model for an animal species that has 
two life stages: juvenile (up to 1 year old) and adult. Suppose 
the female adults give birth each year to an average of 1.6 
female juveniles. Each year, 30% of the juveniles survive 
to become adults and 80% of the adults survive. For k > 0, 
let Xk = (jk, ak), where the entries in x^ are the numbers of 
female juveniles and female adults in year k. 

a. Construct the stage-matrix A such that x ^+1 = Ax k for 

k > 0. 

b. Show that the population is growing, compute the even¬ 
tual growth rate of the population, and give the eventual 
ratio of juveniles to adults. 

c. [M] Suppose that initially there are 15 juveniles and 10 
adults in the population. Produce four graphs that show 
how the population changes over eight years: (a) the 
number of juveniles, (b) the number of adults, (c) the 
total population, and (d) the ratio of juveniles to adults 
(each year). When does the ratio in (d) seem to stabilize? 
Include a listing of the program or keystrokes used to 
produce the graphs for (c) and (d). 

18. A herd of American buffalo (bison) can be modeled by a stage 
matrix similar to that for the spotted owls. The females can be 
divided into calves (up to 1 year old), yearlings (1 to 2 years), 
and adults. Suppose an average of 42 female calves are 
born each year per 100 adult females. (Only adults produce 
offspring.) Each year, about 60% of the calves survive, 75% 
of the yearlings survive, and 95% of the adults survive. For 
k > 0, let Xk = ( Ck , yk,ak)> where the entries in Xk are the 
numbers of females in each life stage at year k. 

a. Construct the stage-matrix A for the buffalo herd, such 
that Xk- t-i = Axk for k >0. 

b. [M] Show that the buffalo herd is growing, determine 
the expected growth rate after many years, and give the 
expected numbers of calves and yearlings present per 100 
adults. 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. The first step is to write x 0 as a linear combination of Vi, v 2 , and V 3 . Row reduction 
of [ vi v 2 V 3 xo ] produces the weights c\ = 2, C 2 = 1, and C 3 = 3, so that 

x 0 = 2vi + lv 2 + 3v 3 

Since the eigenvalues are 1, f, and \, the general solution is 


xjfe 






v 3 


2 


2 

2 

1 



2 

1 

2 



1 

2 

2 


2. As k -> 00 , the second and third terms in (12) tend to the zero vector, and 


( 12 ) 


Xjfe 


2 \* {V k 
2v, + [ 3 *2 + 3 j 


V 3 -> 2 v 


1 


4 

4 

2 


5.7 APPLICATIONS TO DIFFERENTIAL EQUATIONS 


This section describes continuous analogues of the difference equations studied in 
Section 5.6. In many applied problems, several quantities are varying continuously in 
time, and they are related by a system of differential equations: 


— CL\iX1 T ••• T CL\ n X n 
x? = ^2l x l + * * * + Cl2nXn 




a n \X\ T • • • T a 


n n 



Here X\,... ,x n are differentiable functions of t , with derivatives x[,... ,x' n , and the aij 
are constants. The crucial feature of this system is that it is linear. To see this, write the 
system as a matrix differential equation 


x'(t) = Ax(t) 



where 


x(t) = 

1 

X 

* * * /^ m 

_ 1 

. x '(0 = 

x[(t) 

• 

• 

• 

, and A = 

a 11 

• 

• 

• 

U\n 

• 

• 

• 


_x„(t)_ 




_ fl-nl 

a n n _ 


A solution of equation (1) is a vector-valued function that satisfies (1) for all t in some 
interval of real numbers, such as t > 0 . 

Equation (1) is linear because both differentiation of functions and multiplication of 
vectors by a matrix are linear transformations. Thus, if u and v are solutions of x' = Ax, 
then cu + dx is also a solution, because 


(cu + d\)' = cvl! + d\' 

= cAu + dA\ = A(cu + d\) 
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(Engineers call this property superposition of solutions.) Also, the identically zero 
function is a (trivial) solution of (1). In the terminology of Chapter 4, the set of all 
solutions of (1) is a subspace of the set of all continuous functions with values in M 77 . 

Standard texts on differential equations show that there always exists what is called 
a fundamental set of solutions to (1). If A is n x/i, then there are n linearly independent 
functions in a fundamental set, and each solution of (1) is a unique linear combination 
of these n functions. That is, a fundamental set of solutions is a basis for the set of 
all solutions of (1), and the solution set is an n -dimensional vector space of functions. 
If a vector xo is specified, then the initial value problem is to construct the (unique) 
function x such that x' = Ax and x( 0 ) = xq. 

When A is a diagonal matrix, the solutions of (1) can be produced by elementary 
calculus. For instance, consider 


x[(t) 


"3 

O" 


Xl (t) 

x' 2 (t) 


0 

-5 


_X 2 (t) _ 


that is, 

x[(t) = 3xi {t) 

x 2 (t ) = —5x2 (0 


( 2 ) 

( 3 ) 


The system (2) is said to be decoupled because each derivative of a function depends 
only on the function itself, not on some combination or “coupling” of both X\(t) and 
X2(0- From calculus, the solutions of (3) are x\(t) = C\e 3t and X2(0 = C2e~ St , for any 
constants C\ and C 2 - Each solution of equation (2) can be written in the form 


X\(t )" 


r 3t 1 

c\e M 


T 

3 1 1 

"0" 

_x 2 (t) _ 


C2e~ 5t 

= Cl 

0 

e + c 2 

1 



This example suggests that for the general equation x' = Ax, a solution might be a 
linear combination of functions of the form 



for some scalar A and some fixed nonzero vector v. [If v = 0 , the function x(t) is 
identically zero and hence satisfies x' = Ax.] Observe that 

x'(t) = Xxe kt By calculus, since v is a constant vector 
Ax(t) = Axe Xt Multiplying both sides of (4) by A 

Since e Xt is never zero, x'(t ) will equal Ax(t) if and only if Av = Ax, that is, if and only 
if A is an eigenvalue of A and v is a corresponding eigenvector. Thus each eigenvalue- 
eigenvector pair provides a solution (4) of x! = Ax. Such solutions are sometimes called 
eigenfunctions of the differential equation. Eigenfunctions provide the key to solving 
systems of differential equations. 


EXAMPLE 1 The circuit in Figure 1 can be described by the differential equation 


x[ (0 


’-(1 /Ri + I/R2VC1 

\/{R 2 Ci) ' 


X\{t) 

x 2 (t) 


1 KR2C2) 

-1 /(R 2 C 2 )_ 


_X 2 (t) _ 


where x\(t) and X 2 (t) are the voltages across the two capacitors at time t. Suppose 
resistor R\ is 1 ohm, R2 is 2 ohms, capacitor C 1 is 1 farad, and C2 is .5 farad, and 
suppose there is an initial charge of 5 volts on capacitor C\ and 4 volts on capacitor C 2 . 
Find formulas for x\ (t) and X 2 (t) that describe how the voltages change over time. 


FIGURE 1 
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SOLUTION Let A denote the matrix displayed above, and let x(t) 


Xl(t) 

x 2 (t) 


. For the 


data given, A 


1.5 .5 

1 -1 


, and x(0) 


5 

4 


. The eigenvalues of A are X\ 


.5 


and A 


2, with corresponding eigenvectors 





The eigenfunctions xi (t) = \\e Xxt andx 2 (t) = \ 2 e Xlt both satisfy x' = Ax, and so does 
any linear combination of xi and X 2 . Set 


x(t ) = c\\\e Xlt + C 2 \ie X2t 



and note that x(0) = C\V\ + C 2 V 2 . Since Vi and V 2 are obviously linearly independent 
and hence span R 2 , c\ and C 2 can be found to make x(0) equal to xq . In fact, the equation 



t t t 

Vi V 2 x 0 


leads easily to c\ = 3 and C 2 = —2. Thus the desired solution of the differential equation 
x' = Ax is 

x(t) = 3 




Xi(t) " 


3e • 5t + 2e 2t 

_x 2 (t) _ 


6e~ 5t — 2e~ 2t 


Figure 2 shows the graph, or trajectory , of x(t), for t > 0, along with trajectories for 
some other initial points. The trajectories of the two eigenfunctions xi and X 2 lie in the 
eigenspaces of A. 

The functions Xi and x 2 both decay to zero as t -> 00 , but the values of x 2 
decay faster because its exponent is more negative. The entries in the corresponding 
eigenvector v 2 show that the voltages across the capacitors will decay to zero as rapidly 
as possible if the initial voltages are equal in magnitude but opposite in sign. ■ 



FIGURE 2 The origin as an attractor. 
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In Figure 2, the origin is called an attractor, or sink, of the dynamical system 
because all trajectories are drawn into the origin. The direction of greatest attraction 
is along the trajectory of the eigenfunction X 2 (along the line through 0 and V 2 ) 
corresponding to the more negative eigenvalue, A = —2. Trajectories that begin at points 
not on this line become asymptotic to the line through 0 and Vi because their components 
in the V 2 direction decay so rapidly. 

If the eigenvalues in Example 1 were positive instead of negative, the corresponding 
trajectories would be similar in shape, but the trajectories would be traversed away from 
the origin. In such a case, the origin is called a repeller, or source, of the dynamical 
system, and the direction of greatest repulsion is the line containing the trajectory of the 
eigenfunction corresponding to the more positive eigenvalue. 


EXAMPLE 2 Suppose a particle is moving in a planar force field and its position 
vector x satisfies x' = Ax and x(0) = xq, where 




Solve this initial value problem for t >0, and sketch the trajectory of the particle. 


SOLUTION The eigenvalues of A turn out to be A 1 = 6 and A 2 = — 1, with correspond¬ 
ing eigenvectors Vi = (—5, 2) and V 2 = (1, 1). For any constants C\ and C 2 , the function 


x(t) = c\\\e Xxt + C 2 yie Xlt 




= —5" 

6t 1 

T 

C\ 

2 

e + c 2 

1 



is a solution of x' = Ax. We want C\ and C2 to satisfy x(0) = x 0 , that is, 




2.9 

2.6 


Calculations show that 



—3/70 and C 2 = 188/70, and so the desired function is 


-3 

r~ 5 ] 


188 

r 11 

70 

2 

e bt + 

70 

1 


Trajectories of x and other solutions are shown in Figure 3. 


In Figure 3, the origin is called a saddle point of the dynamical system because 
some trajectories approach the origin at first and then change direction and move away 
from the origin. A saddle point arises whenever the matrix A has both positive and 
negative eigenvalues. The direction of greatest repulsion is the line through Vi and 0, 
corresponding to the positive eigenvalue. The direction of greatest attraction is the line 
through v 2 and 0, corresponding to the negative eigenvalue. 



FIGURE 3 The origin as a saddle point. 
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Decoupling a Dynamical System 


The following discussion shows that the method of Examples 1 and 2 produces a 
fundamental set of solutions for any dynamical system described by x' = Ax when A 
is n x n and has n linearly independent eigenvectors, that is, when A is diagonalizable. 
Suppose the eigenfunctions for A are 


a 1 1 


1 , ..., \„e 


X n t 


with \\,... ,x n linearly independent eigenvectors. Let P = [ Vi • • • x n ], and let D be 

the diagonal matrix with entries X \,..., X n , so that A = PDP~ l . Now make a change 
of variable, defining a new function y by 


y (t) = P ! x(7) or, equivalently, x(t) = Py(t ) 

The equation x(t) = Py{t) says that y(t) is the coordinate vector of x(t ) relative to the 
eigenvector basis. Substitution of Py for x in the equation x' = Ax gives 


d 

dt 


(Py) = A(P y) = (PDp-')Py = PDy 


(5) 


Since P is a constant matrix, the left side of (5) is Py'. Left-multiply both sides of (5) 
by P~ l and obtain y' = D y, or 


y[(t) 

y'ld) 

• 

a 

-1 

• 

• 

_y»(0_ 

1- 


Ai 

0 


0 


0 


0 


0 


0 


A 


n 



yiO) 


yi(t) 

• 


• 

• 

_y n (t) _ 


The change of variable from x to y has decoupled the system of differential equations, 
because the derivative of each scalar function y^ depends only on y \. (Review the anal¬ 
ogous change of variables in Section 5.6.) Since y[ = Aiji, we have y\(t) = 
with similar formulas for y2,, y n • Thus 


C\e 


A 1 1 


y (0 


A 1 1 


c\e 1 


c n e 


A n t 


where 


c 1 


c 


n 


y(0) = P~ l x( 0) = P~ l x o 


-1 


To obtain the general solution x of the original system, compute 


x(0 = Py(t) = [vi 


v„ ] y (0 


Ai t 


c\\\e 1 H- \-c n x n e 


X n t 


This is the eigenfunction expansion constructed as in Example 1. 


Complex Eigenvalues 

In the next example, a real matrix A has a pair of complex eigenvalues A and A, with 
associated complex eigenvectors v and v. (Recall from Section 5.5 that for a real matrix, 
complex eigenvalues and associated eigenvectors come in conjugate pairs.) So two 
solutions of x' = Ax are 

xi(0 = ve Xt and x 2 (t) = ve kt (6) 

It can be shown that x 2 (t) = x\(t) by using a power series representation for the 
complex exponential function. Although the complex eigenfunctions xi and x 2 are 
convenient for some calculations (particularly in electrical engineering), real functions 
are more appropriate for many purposes. Fortunately, the real and imaginary parts of x\ 
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FIGURE 4 


are (real) solutions of x' = v4x, because they are linear combinations of the solutions 
in (6): 


Re(ve A 0 = -[xi(0 +xi(»], 


1 


Imlv^O 


d-[xi(/) -xi(f)] 


To understand the nature of Re^e^)* recall from calculus that for any number x, 
the exponential function e x can be computed from the power series: 


X 


1 9 1 „ 

1 T x T —x T • • • T —x T 

2 ! n 


This series can be used to define e Xt when A is complex: 


A t 


1 + (AO + —(AO 2 + • • • H—r(AO /? + 

2! n\ 


By writing A = a + bi (with a and b real), and using similar power series for the cosine 
and sine functions, one can show that 


(a-\-bi)t 


e at • e ibt 


e at (cos bt + i sin bt) 


(7) 


Hence 


ye 


A t 


(Rev + / Imv) • e (cos bt + i sin bt) 
[ (Rev) cos bt — (Imv) sin bt ]e at 
+ i [ (Re v) sin bt + (Im v) cos bt ]e at 


So two real solutions of x' = v4x are 


y l (0 = Re Xi (0 = [ (Re v) cos bt — (Im v) sin bt ] e ' 
y 2 (0 — Im Xi (0 = [ (Re v) sin bt + (Im v) cos bt ] e 


at 


It can be shown that and y 2 are linearly independent functions (when b ^ 0). 1 


EXAMPLE 3 The circuit in Figure 4 can be described by the equation 


Z L 


~-r 2 /l 

-1/L 


/l 

V C _ 


\/c 

-l/(R l C)_ 


_T C _ 


where O is the current passing through the inductor L and Vc is the voltage drop across 
the capacitor C. Suppose R\ is 5 ohms, R 2 is .8 ohm, C is .1 farad, and L is A henry. 
Find formulas for ii and Vc , if the initial current through the inductor is 3 amperes and 
the initial voltage across the capacitor is 3 volts. 


SOLUTION For the data given, A 


-2 

10 


2.5 

-2 


cussed in Section 5.5 produces the eigenvalue A 

1 


and xq 


3 

3 


. The method dis- 


2 + 5/ and the corresponding 


eigenvector vi 
nations of 


2 


. The complex solutions of x ; = v4x are complex linear combi- 


Xi(?) 


1 

2 


{-2+5i)t 


and x 2 (/) 


■1 

2 


(-2 -5i)t 


1 Since x 2 (0 is the complex conjugate of xi (/), the real and imaginary parts of x 2 (0 are jq (?) and — y 2 (f), 
respectively. Thus one can use either xi (t) or x 2 (/), but not both, to produce two real linearly independent 
solutions of x' = Ax. 
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Next, use equation (7) to write 



FIGURE 5 

The origin as a spiral point 


XiO) 


1 

2 


e 2r (cos 5/+ / sin 5/) 


The real and imaginary parts of xi provide real solutions: 


Yi(0 


— sin 5t 
2 cos 5t 


-it 


y 2 (0 


cos 5t 
2 sin 5t 


-it 


Since y l and y 2 are linearly independent functions, they form a basis for the two 
dimensional real vector space of solutions of x' = Ax. Thus the general solution is 


x(f) 



— sin 5t 

—It i 

cos 5t 

Cl 

2 cos 5t 

e - + c 2 

2 sin 5t 


- 2 1 


To satisfy x(0) 
C 2 = 3. Thus 


"3" 

3 

, we need c\ 

" 0 " 

2 

+ ^2 

T 

0 

— 

"3" 

3 


, which leads to c\ = 1.5 and 


x(t) = 1.5 


— sin 5t 

—7 1 ^ 

cos 5t 

2 cos 5t 

e + 3 

2 sin 5t 


-it 


or 


’ i L C t ) " 


— 1.5 sin 5t + 3 cos 5t 

_ vc (0 _ 


3 cos 5t + 6 sin 5t 


-it 


See Figure 5. 


In Figure 5, the origin is called a spiral point of the dynamical system. The rotation 
is caused by the sine and cosine functions that arise from a complex eigenvalue. The 
trajectories spiral inward because the factor e~ 2t tends to zero. Recall that —2 is the real 
part of the eigenvalue in Example 3. When A has a complex eigenvalue with positive 
real part, the trajectories spiral outward. If the real part of the eigenvalue is zero, the 
trajectories form ellipses around the origin. 


PRACTICE PROBLEMS 

A real 3x3 matrix A has eigenvalues —.5, .2 + .3/, and .2 — .3/, with corresponding 
eigenvectors 



1 " 


1 + 2 i 


" 1 - 2 /" 

Vi = 

-2 

1 

, v 2 = 

M 

2 

, and V 3 = 

-4/ 

2 


1. Is A diagonalizable as A = PDP 1 2 3 , using complex matrices? 

2. Write the general solution of x' = Ax using complex eigenfunctions, and then find 
the general real solution. 

3. Describe the shapes of typical trajectories. 


5.7 EXERCISES 



A particle moving in a planar force field has a position vector 
x that satisfies x' = Ax. The 2x2 matrix A has eigenvalues 


4 and 2, with corresponding eigenvectors Vi = 


-3 

1 


and 




. Find the position of the particle at time t. 


assuming that x(0) 
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2. Let A be a 2 x 2 matrix with eigenvalues —3 and —1 and 


corresponding eigenvectors Vi = 


-1 

1 


andv 2 = 


1 

1 


. Let 


x(t) be the position of a particle at time t. Solve the initial 

r 2 

value problem x' = Ax, x(0) = 


In Exercises 3-6, solve the initial value problem x'(t) = Ax(t) 
for t >0, with x(0) = (3,2). Classify the nature of the origin 
as an attractor, repeller, or saddle point of the dynamical system 
described by x ' = Ax. Find the directions of greatest attraction 
and/or repulsion. When the origin is a saddle point, sketch typical 
trajectories. 


3. A = 




5. A = 





In Exercises 7 and 8, make a change of variable that decouples the 
equation x' = Ax. Write the equation x(f) = Py(t) and show the 
calculation that leads to the uncoupled system y' = D y, specify¬ 
ing P and D. 

7. A as in Exercise 5 8. A as in Exercise 6 


In Exercises 9-18, construct the general solution of x' = Ax 
involving complex eigenfunctions and then obtain the general real 
solution. Describe the shapes of typical trajectories. 


9. A = 

11 . A = 

13. A = 



15. [M] A = 



-12 

1 

12 


10 . A = 

12 . A = 

14. A = 

- 6 " 

2 

5 




16. [M] A = 

17. [M] A = 

18. [M] A = 


-6 

-11 

16 

2 

5 

-4 


-4 

-5 

10 


30 

64 

23 

-11 

-23 

— 

9 

6 

15 


4 

53 

-30 

-2" 


90 

-52 

-3 


20 

-10 

2 



19. [M] Find formulas for the voltages v\ and v 2 (as functions of 
time t) for the circuit in Example 1, assuming that R\ = 1/5 
ohm, R 2 = 1/3 ohm, C\ = 4 farads, C 2 = 3 farads, and the 
initial charge on each capacitor is 4 volts. 


20. [M] Find formulas for the voltages v\ and v 2 for the circuit in 
Example 1, assuming that R[ = 1/15 ohm, R 2 = 1/3 ohm, 
C\ — 9 farads, C 2 — 2 farads, and the initial charge on each 
capacitor is 3 volts. 


21. [M] Find formulas for the current ii and the voltage vc 
for the circuit in Example 3, assuming that R\ = 1 ohm, 
R 2 = .125 ohm, C — .2 farad, L = .125 henry, the initial 
current is 0 amp, and the initial voltage is 15 volts. 


22. [M] The circuit in the figure is described by the equation 


Z L 


0 

l/L 


lL 

V C_ 


-VC 

-\/{RC)_ 


_ Vc _ 


where ii is the current through the inductor L and vc is the 
voltage drop across the capacitor C. Find formulas for i L 
and Vc when R = .5 ohm, C = 2.5 farads, L = .5 henry, 
the initial current is 0 amp, and the initial voltage is 12 volts. 


R 


C 



SOLUTIONS TO PRACTICE PROBLEMS 

1. Yes, the 3x3 matrix is diagonalizable because it has three distinct eigenvalues. 
Theorem 2 in Section 5.1 and Theorem 5 in Section 5.3 are valid when complex 
scalars are used. (The proofs are essentially the same as for real scalars.) 

2. The general solution has the form 



Cl 

r 

-2 

e ' 51 + c 2 

1 +2i 
M 

e (.2+3i)t + ^ 

"1-2/" 
— M 


1 


2 


2 


e (.2-.3i)t 


The scalars C\ , , and c 3 here can be any complex numbers. The first term in x(t) is 

real. Two more real solutions can be produced using the real and imaginary parts of 
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the second term in x(t ): 


1 +2 i 
M 
2 


e' 2? (cos 3t + i sin .30 


The general real solution has the following form, with real scalars C\ , c 2 , and cy 


Cl 


r 


cos 3t — 2 sin 3t 


sin 3t + 2 cos 3t 

-2 

e ' 5t + c 2 

—4 sin 3t 

e 2t + c 3 

4 cos 3t 

1 


2 cos 3t 


2 sin 3t 


.21 


3. Any solution with c 2 = <?3 = 0 is attracted to the origin because of the negative 
exponential factor. Other solutions have components that grow without bound, and 
the trajectories spiral outward. 

Be careful not to mistake this problem for one in Section 5. 6 . There the condition 
for attraction toward 0 was that an eigenvalue be less than 1 in magnitude, to make 


A 


k 


* 


0. Here the real part of the eigenvalue must be negative, to make e Xt -> 0. 


5.8 ITERATIVE ESTIMATES FOR EIGENVALUES 


In scientific applications of linear algebra, eigenvalues are seldom known precisely. 
Fortunately, a close numerical approximation is usually quite satisfactory. In fact, some 
applications require only a rough approximation to the largest eigenvalue. The first 
algorithm described below can work well for this case. Also, it provides a foundation 
for a more powerful method that can give fast estimates for other eigenvalues as well. 


The Power Method 

The power method applies to an n x n matrix A with a strictly dominant eigenvalue 
Ai, which means that X\ must be larger in absolute value than all the other eigenvalues. 
In this case, the power method produces a scalar sequence that approaches Ai and a 
vector sequence that approaches a corresponding eigenvector. The background for the 
method rests on the eigenvector decomposition used at the beginning of Section 5.6. 

Assume for simplicity that A is diagonalizable and W 1 has a basis of eigenvectors 
Vi,..., \ n , arranged so their corresponding eigenvalues Ai,..., X n decrease in size, with 
the strictly dominant eigenvalue first. That is, 


|Ax| > |A 2 | 2 |A 3 1 > ••• > |A 


n 


t 


Strictly larger 


(i) 


As we saw in equation (2) of Section 5.6, if x in R 7? is written as x = c\ V\ + • • • + c n \ n , 
then 


A k \ = ci(Ai) K vi + c 2 (A 2 ) /c v 2 H-1- c n (X n ) K \ n (k = 1,2,...) 


k 


k 


k 


Assume c\ 7 ^ 0. Then, dividing by (Ai) , 


(Ai) 


1 t / a 2 \* (x,r k 

A k X = C 1 V 1 + c 2 ( — I v 2 H-be, 


k 


Ai 


n 


Ai 


n 


(k = 1,2,...) (2) 


From inequality ( 1 ), the fractions A 2 /Ai, ..., X n /X\ are all less than 1 in magnitude and 
so their powers go to zero. Hence 


(Ai) k A k x^c\V\ as k 


oo 


(3) 
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Thus, for large k, a scalar multiple of A k x determines almost the same direction as the 
eigenvector c\V \. Since positive scalar multiples do not change the direction of a vector, 
A k x itself points almost in the same direction as Vi or — Vi, provided c i ^ 0. 


EXAMPLE 1 Let A = 






, and x 




. Then A has 


eigenvalues 2 and 1, and the eigenspace for X\ = 2 is the line through 0 and Vi. For 
k = 0,..., 8, compute A k x and construct the line through 0 and A k x. What happens as 
k increases? 


SOLUTION The first three calculations are 


Ax = 

oo <N 
• • 

!_1 

.8 

1.2 


-.5 

1 

— 

-.1 

1.1 

A 2 x = A(Ax) = 

OO <N 
• • 

1_1 

.8" 

1.2 


"-.r 

i.i 

— 

.7" 

1.3 

A 3 x = A(A 2 x) = 

"1.8 

.2 

.8" 

1.2 


i— 

o co 
• • 

1_ 

— 

'2.3' 

1.7 


Analogous calculations complete Table 1. 


TABLE 1 Iterates of a Vector 


k 

01234567 8 

A k x 


-.5 

1 


-.1 

1.1 


.7 

1.3 


2.3 

1.7 


5.5 

2.5 


11.9 

4.1 


24.7 

7.3 


50.3 

13.7 


101.5 

26.5 


The vectors x, Ax ,..., A 4 x are shown in Figure 1. The other vectors are growing 
too long to display. However, line segments are drawn showing the directions of those 
vectors. In fact, the directions of the vectors are what we really want to see, not the vec¬ 
tors themselves. The lines seem to be approaching the line representing the eigenspace 
spanned by Vi. More precisely, the angle between the line (subspace) determined by 
A k x and the line (eigenspace) determined by Vi goes to zero as k -> oo. ■ 


X, 



X 


1 


FIGURE 1 Directions determined by x, Ax, A l x ,..., A 1 x. 


The vectors (X\)~ k A k x in (3) are scaled to make them converge to c\V \, provided 
C\ ^ 0. We cannot scale A k x in this way because we do not know X\. But we can scale 
each A k x to make its largest entry a 1. It turns out that the resulting sequence {x^} will 
converge to a multiple of Vi whose largest entry is 1. Figure 2 shows the scaled sequence 
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for Example 1. The eigenvalue A i can be estimated from the sequence {x^ }, too. When 
is close to an eigenvector for X \, the vector Axj^ is close to Aix&, with each entry in 
v4x£ approximately X\ times the corresponding entry in x^. Because the largest entry in 
Xk is 1, the largest entry in Axk is close to A i. (Careful proofs of these statements are 
omitted.) 




2 



X 


1 


THE POWER METHOD FOR ESTIMATING A STRICTLY DOMINANT EIGENVALUE 

1. Select an initial vector xo whose largest entry is 1. 

2. For k = 0,1,..., 

a. Compute Ax ^. 

b. Let fik be an entry in Ax^ whose absolute value is as large as possible. 

c. Compute x^+i = (l//i k )Ax k . 

3. For almost all choices of x 0 , the sequence {/x/J approaches the dominant 
eigenvalue, and the sequence {x^} approaches a corresponding eigenvector. 


EXAMPLE 2 Apply the power method to A 


6 

1 


5 

2 


with xq 


0 

1 


. Stop 


when k = 5, and estimate the dominant eigenvalue and a corresponding eigenvector 
of A. 


SOLUTION Calculations in this example and the next were made with MATLAB, 
which computes with 16-digit accuracy, although we show only a few significant figures 
here. To begin, compute Ax o and identify the largest entry /xo in Axq: 



Scale v4xq by 1 / /Xq to get xi, compute Ax \, and identify the largest entry in Ax \: 



Scale Axi by 1//Xi to get x 2 , compute v4x 2 , and identify the largest entry in Ax 2 : 




1_ 

_1 


1 

1_ 

-1 

1 

2 


.225 

1- 


7.125 

1.450 


, /x 2 = 7.125 
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Scale Ax 2 by I//X2 to get X3, and so on. The results of MATLAB calculations for the 
first five iterations are arranged in Table 2. 


TABLE 2 The Power Method for Example 2 


k 

0 

1 

2 

3 

4 

5 



0 


1 


1 


1 


1 


1 


1 


.4 


.225 


.2035 


.2005 


.20007 

Ax k 


5 


8 


7.125 


7.0175 


7.0025 


7.00036 


2 


1.8 


1.450 


1.4070 


1.4010 


1.40014 

h'k 

5 

8 

7.125 

7.0175 

7.0025 

7.00036 


The evidence from Table 2 strongly suggests that {x^} approaches (1, .2) and {/x^} 
approaches 7. If so, then (1, .2) is an eigenvector and 7 is the dominant eigenvalue. This 
is easily verified by computing 



The sequence {/x/J in Example 2 converged quickly to X\ =7 because the second 
eigenvalue of A was much smaller. (In fact, A 2 = 1.) In general, the rate of convergence 
depends on the ratio | A 2 /A 1 1, because the vector 02 (^ 2 /^\) k v 2 in equation (2) is the main 
source of error when using a scaled version of A k x as an estimate of C\\\ . (The other 
fractions Xj/X\ are likely to be smaller.) If IA 2 /A 1 1 is close to 1, then {/x^} and {x^} can 
converge very slowly, and other approximation methods may be preferred. 

With the power method, there is a slight chance that the chosen initial vector x 
will have no component in the Vi direction (when c\ = 0). But computer rounding 
errors during the calculations of the x^ are likely to create a vector with at least a small 
component in the direction of Vi. If that occurs, the x^ will start to converge to a multiple 

of Vi. 


The Inverse Power Method 

This method provides an approximation for any eigenvalue, provided a good initial 
estimate a of the eigenvalue A is known. In this case, we let B = (A — aI)~ l and apply 
the power method to B . It can be shown that if the eigenvalues of A are A 1 ,..., X n , then 
the eigenvalues of B are 


1 1 1 

9 • • • 9 « 

X\ — a X 2 — oi X n — a 

and the corresponding eigenvectors are the same as those for A. (See Exercises 15 and 
16.) 

Suppose, for example, that a is closer to A2 than to the other eigenvalues of A. 
Then 1 /(A 2 — oi) will be a strictly dominant eigenvalue of B . If a is really close to A 2 , 
then 1 / (A 2 — a) is much larger than the other eigenvalues of B , and the inverse power 
method produces a very rapid approximation to A 2 for almost all choices of xo. The 
following algorithm gives the details. 
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THE INVERSE POWER METHOD FOR ESTIMATING AN EIGENVALUE A OF A 

1. Select an initial estimate a sufficiently close to A. 

2. Select an initial vector x 0 whose largest entry is 1. 

3. For k = 0,1,..., 

a. Solve ( A — al)y k = for y k . 

b. Let jib be an entry in y k whose absolute value is as large as possible. 

c. Compute Vk = a + (1 / jik). 

d. Compute x^+i = (l/ii k )y k . 

4. For almost all choices of xo, the sequence {v/J approaches the eigenvalue A 
of A , and the sequence {x^ } approaches a corresponding eigenvector. 


Notice that B , or rather (A — a/) -1 , does not appear in the algorithm. Instead of 
computing (A — aI)~ l Xk to get the next vector in the sequence, it is better to solve 
the equation (A — al) y k = for y k (and then scale y k to produce x^+i). Since this 
equation for y k must be solved for each A, an LU factorization of A — al will speed up 
the process. 


EXAMPLE 3 It is not uncommon in some applications to need to know the smallest 
eigenvalue of a matrix A and to have at hand rough estimates of the eigenvalues. 
Suppose 21,3.3, and 1.9 are estimates for the eigenvalues of the matrix A below. Find 
the smallest eigenvalue, accurate to six decimal places. 



-8 

13 

5 



SOLUTION The two smallest eigenvalues seem close together, so we use the inverse 
power method for A — 1.9/. Results of a MATLAB calculation are shown in Table 3. 
Here x 0 was chosen arbitrarily, y k = (A — 1.9/) -1 x^, is the largest entry in y k , 
Vk = 1.9+ 1 / [ik , and Xk~\~\ = (l//x^)y^. As it turns out, the initial eigenvalue estimate 
was fairly good, and the inverse power sequence converged quickly. The smallest 
eigenvalue is exactly 2. ■ 


TABLE 3 The Inverse Power Method 



If an estimate for the smallest eigenvalue of a matrix is not available, one can simply 
take a — 0 in the inverse power method. This choice of a works reasonably well if the 
smallest eigenvalue is much closer to zero than to the other eigenvalues. 
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The two algorithms presented in this section are practical tools for many simple 
situations, and they provide an introduction to the problem of eigenvalue estimation. A 
more robust and widely used iterative method is the QR algorithm. For instance, it is 
the heart of the MATLAB command eig ( A ), which rapidly computes eigenvalues and 
eigenvectors of A. A brief description of the QR algorithm was given in the exercises 
for Section 5.2. Further details are presented in most modern numerical analysis texts. 


PRACTICE PROBLEM 

How can you tell if a given vector x is a good approximation to an eigenvector of a 
matrix A? If it is, how would you estimate the corresponding eigenvalue? Experiment 
with 



5 

8 

4" 


1.0" 

A = 

8 

3 

-1 

and x = 

-4.3 


4 

-1 

2 


8.1 


5.8 EXERCISES 


In Exercises 1-4, the matrix A is followed by a sequence {x^} 
produced by the power method. Use these data to estimate the 
largest eigenvalue of A, and give a corresponding eigenvector. 


6. Let A = 




. Repeat Exercise 5, using the following 


sequence x, Ax,..., A 5 x. 




5. Let A = 



. The vectors x,..., A 5 x are 



31" 


-191 


1 

'O 

VO 

_i 


-4991 


24991 

-41 

•> 

241 

5 

-1241 

5 

6241 


-31241 


Find a vector with a 1 in the second entry that is close to 
an eigenvector of A. Use four decimal places. Check your 
estimate, and give an estimate for the dominant eigenvalue 
of A. 


1 


-5 


-29 


-125 


-509 


-2045 

1 


13 


61 


253 


1021 

9 

4093 


[M] Exercises 7-12 require MATLAB or other computational aid. 
In Exercises 7 and 8, use the power method with the x 0 given. List 
{x^} and {/ik} for k = 1,..., 5. In Exercises 9 and 10, list /x 5 and 

6 • 


7. 

A = 

"6 

_ 8 

7" 

5 _ 

,x 0 = 

T 

_o_ 

8. 

A = 

"2 

4 

1' 
5 

,x 0 = 

T 

0 



"8 

0 

12" 


" 1 " 

9. A = 

1 

-2 

1 

,x 0 = 

0 


0 

3 

0 


0 



" 1 

2 

-2" 


" 1 " 

10. A = 

1 

1 

9 

,x 0 = 

0 


0 

1 

9 


0 


Another estimate can be made for an eigenvalue when an ap¬ 
proximate eigenvector is available. Observe that if Ax = Ax, then 
x T Ax = x r (Ax) = A(x r x), and the Rayleigh quotient 



x T Ax 

X T X 


equals A. If x is close to an eigenvector for A, then this quotient is 
close to A. When A is a symmetric matrix (A r = A), the Rayleigh 
quotient R(xQ = (x[Axd/(x[xd will have roughly twice as 
many digits of accuracy as the scaling factor au in the power 
method. Verify this increased accuracy in Exercises 11 and 12 by 
computing au and R(xQ for A = 1,..., 4. 
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11. A = 




12. A = 




Exercises 13 and 14 apply to a 3 x 3 matrix A whose eigenvalues 
are estimated to be 4, —4, and 3. 

13. If the eigenvalues close to 4 and —4 are known to have 
different absolute values, will the power method work? Is it 
likely to be useful? 


14. Suppose the eigenvalues close to 4 and —4 are known to have 
exactly the same absolute value. Describe how one might 
obtain a sequence that estimates the eigenvalue close to 4. 

15. Suppose Ax = Ax with 0. Let a be a scalar different 
from the eigenvalues of A , and let B = (A — al )~ l . Subtract 
ax from both sides of the equation Ax = Ax, and use algebra 
to show that \/{X — a) is an eigenvalue of B , with x a 
corresponding eigenvector. 


16. Suppose /x is an eigenvalue of the B in Exercise 15, and that 
x is a corresponding eigenvector, so that (A — al)~ l x = fix. 
Use this equation to find an eigenvalue of A in terms of /x and 
a. [Note: g / 0 because B is invertible.] 


17. [M] Use the inverse power method to estimate the middle 
eigenvalue of the A in Example 3, with accuracy to four 
decimal places. Set x 0 = (1,0,0). 


18. [M] Let A be as in Exercise 9. Use the inverse power method 
with x 0 = (1,0,0) to estimate the eigenvalue of A near 
a = —1.4, with an accuracy to four decimal places. 


[M] In Exercises 19 and 20, find (a) the largest eigenvalue and (b) 
the eigenvalue closest to zero. In each case, set x 0 = (1,0,0,0) 
and carry out approximations until the approximating sequence 
seems accurate to four decimal places. Include the approximate 
eigenvector. 


19. A 


20. A 


10 7 8 7 

7 5 6 5 

8 6 10 9 

7 5 9 10 

1 2 3 2" 

2 12 13 11 

-2 3 0 2 

4 5 7 2 


21. A common misconception is that if A has a strictly dominant 
eigenvalue, then, for any sufficiently large value of k , the 
vector A k x is approximately equal to an eigenvector of A . For 
the three matrices below, study what happens to A k x when 
x = (.5, .5), and try to draw general conclusions (for a 2 x 2 
matrix). 


a. A = 

c. A = 



0 

.2 

0 

2 


b . A = 


1 

0 



SOLUTION TO PRACTICE PROBLEM 


For the given A and x, 



5 

8 

4" 


1.00 


3.00 

Ax = 

8 

3 

-1 


-4.30 

— 

-13.00 


4 

-1 

2 


8.10 


24.50 


If Ax is nearly a multiple of x, then the ratios of corresponding entries in the two vectors 
should be nearly constant. So compute: 


{entry in Ax} A 

{entry in x} = 

= {ratio} 

3.00 

1.00 

3.000 

-13.00 

-4.30 

3.023 

24.50 

8.10 

3.025 


WEB 


Each entry in Ax is about 3 times the corresponding entry in x, so x is close to an 
eigenvector. Any of the ratios above is an estimate for the eigenvalue. (To five decimal 
places, the eigenvalue is 3.02409.) 
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CHAPTERS SUPPLEMENTARY EXERCISES 


Throughout these supplementary exercises, A and B represent 
square matrices of appropriate sizes. 


1. Mark each statement as True or False. Justify each answer. 

a. If A is invertible and 1 is an eigenvalue for A, then 1 is 
also an eigenvalue of A -1 . 

b. If A is row equivalent to the identity matrix I , then A is 
diagonalizable. 

c. If ^4 contains a row or column of zeros, then 0 is an 
eigenvalue of A. 

d. Each eigenvalue of A is also an eigenvalue of A 2 . 

e. Each eigenvector of A is also an eigenvector of A 2 . 

f. Each eigenvector of an invertible matrix A is also an 
eigenvector of A -1 . 

g. Eigenvalues must be nonzero scalars. 

h. Eigenvectors must be nonzero vectors. 

i. Two eigenvectors corresponding to the same eigenvalue 
are always linearly dependent. 

j. Similar matrices always have exactly the same eigen¬ 
values. 


k. 

l. 

m. 

n. 

o. 

P- 

q- 

r. 

s. 



u. 


v. 


w. 


Similar matrices always have exactly the same eigen¬ 
vectors. 

The sum of two eigenvectors of a matrix A is also an 
eigenvector of A. 

The eigenvalues of an upper triangular matrix A are 
exactly the nonzero entries on the diagonal of A. 

The matrices A and A T have the same eigenvalues, 
counting multiplicities. 

If a 5 x 5 matrix A has fewer than 5 distinct eigenvalues, 
then A is not diagonalizable. 

There exists a 2 x 2 matrix that has no eigenvectors in 

M 2 . 


If A is diagonalizable, then the columns of A are linearly 
independent. 

A nonzero vector cannot correspond to two different 
eigenvalues of A. 

A (square) matrix A is invertible if and only if there is a 
coordinate system in which the transformation x i-> Ax 
is represented by a diagonal matrix. 


If each vector e, in the standard basis for M 77 is an 


j 


eigenvector of A, then A is a diagonal matrix. 


If A is similar to a diagonalizable matrix B , then A is 
also diagonalizable. 


If A and B are invertible n xn matrices, then AB is 
similar to BA. 


An n xn matrix with n linearly independent eigenvec¬ 
tors is invertible. 


x. If A is an n x n diagonalizable matrix, then each vector 
in R 77 can be written as a linear combination of eigenvec¬ 
tors of A. 

2. Show that if x is an eigenvector of the matrix product AB and 
Bx 7 ^ 0, then Bx is an eigenvector of BA. 

3. Suppose x is an eigenvector of A corresponding to an eigen¬ 
value A . 

a. Show that x is an eigenvector of 51 — A. What is the 
corresponding eigenvalue? 

b. Show that x is an eigenvector of 51 — 3A + A 2 . What is 
the corresponding eigenvalue? 

4. Use mathematical induction to show that if A is an eigenvalue 
of an n x n matrix A, with x a corresponding eigenvector, 
then, for each positive integer m , A 777 is an eigenvalue of A m , 
with x a corresponding eigenvector. 

5. \ip(t) = c 0 + C]t + c 2 t 2 + • • • + c„t n , define p(A) to be the 
matrix formed by replacing each power of t in p(t) by the 
corresponding power of A (with A 0 = /). That is, 

p{A ) = CqI + C\A + c 2 A 2 + • • • -j- c n A n 


Show that if A is an eigenvalue of A , then one eigenvalue of 

p(A) is/?(A). 

6 . Suppose A = PDP~ l , where P is 2 x 2 and D = 

a. Let B = 51 — 3A + A 2 . Show that B is diagonalizable 
by finding a suitable factorization of B . 

b. Given p(t) and p(A) as in Exercise 5, show that p(A) is 
diagonalizable. 

7. Suppose A is diagonalizable and p(t) is the characteristic 
polynomial of A. Define p(A) as in Exercise 5, and show 
that p(A) is the zero matrix. This fact, which is also true for 
any square matrix, is called the Cayley-Hamilton theorem. 


2 0 
0 7 


8. a. Let A be a diagonalizable n xn matrix. Show that if the 
multiplicity of an eigenvalue A is n, then A = XI. 


b. Use part (a) to show that the matrix A = 
diagonalizable. 


3 

0 


1 

3 


is not 


9. Show that I — A is invertible when all the eigenvalues of A 
are less than 1 in magnitude. [Hint: What would be true if 
I — A were not invertible?] 


10. Show that if A is diagonalizable, with all eigenvalues less 
than 1 in magnitude, then A k tends to the zero matrix as 
k -> oo. [Hint: Consider A k x where x represents any one of 
the columns of I .] 


11. Let u be an eigenvector of A corresponding to an eigenvalue 
A, and let H be the line in R 77 through u and the origin. 

a. Explain why H is invariant under A in the sense that Ax 
is in H whenever x is in H . 
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b. Let K be a one-dimensional subspace of IR ' 7 that is invari¬ 
ant under A. Explain why K contains an eigenvector of 
A. 


12. Let G = 


A 

0 


B 


. Use formula (1) for the determinant 


in Section 5.2 to explain why det G = (det A) (det B ). From 
this, deduce that the characteristic polynomial of G is the 
product of the characteristic polynomials of A and B . 

Use Exercise 12 to find the eigenvalues of the matrices in Exer¬ 
cises 13 and 14. 


13. A = 


16. 


14. A = 


A = 


3 

-2 

8 


0 

5 

-2 


0 

-4 

3 _ 


1 

5 

-6 

-1 

2 

4 

5 

2 

0 

0 

-7 

-4 

0 

0 

3 

1 

be the 

n x 

n ] 

— 

b)I + bJ\ 

tha 

a 

b 

b 

• 

b 

a 

b 

• 

b 

• 

b 

• 

a 

• 

• 

• 

• 

• 

b 

• 

• 

b 

• 

• 

b 

• 


b 

b 

b 


a 


Use the results of Exercise 16 in the Supplementary Exercises 
for Chapter 3 to show that the eigenvalues of A are a — b and 
a + (n — 1 )b. What are the multiplicities of these eigenval¬ 
ues? 


Apply the result of Exercise 15 to find the eigenvalues of the 

7 3 3 3 3 

2 2 ^ ' 

and 


matrices 


1 

2 

2 


1 

2 


2 

1 


3 

3 

3 

3 


7 

3 

3 

3 


3 

7 

3 

3 


3 

3 

7 

3 


3 

3 

3 

7 


17. Let A = 


. Recall from Exercise 25 in Section 


an a 12 

a 2 i a 22 

5.4 that tr A (the trace of A) is the sum of the diagonal entries 
in A . Show that the characteristic polynomial of A is 

A 2 — (tr A)X + det A 

Then show that the eigenvalues of a 2 x 2 matrix A are both 


real if and only if det A < 


tr A 


18. Let A = 


.4 -.3 

.4 1.2 


Explain why A k approaches 


-.5 

1.0 


-.75 

1.50 


as k 


00 


Exercises 19-23 concern the polynomial 


pit) = a 0 + ait + 


+ a n —\t n 1 + t 


n 


and an n x n matrix C p called the companion matrix of p\ 



0 

0 


19. Write the companion matrix C p for p(t) = 6 —5t + t 2 , and 
then find the characteristic polynomial of C p . 


20. Let p(t) = it - 2)it - 3 )(t - 4) = -24 + 26 1 - 91 2 + t\ 
Write the companion matrix for pit), and use techniques 
from Chapter 3 to find its characteristic polynomial. 


21. Use mathematical induction to prove that for n > 2, 

det iC p — XI) = (— l) n (a 0 + a\X + • • • + a n —i A 77-1 + A 77 ) 

= (-l)VW 


[Hint: Expanding by cofactors down the first column, show 
that det (C p —XI) has the form i~X)B + i—l) n a 0 , where B 
is a certain polynomial (by the induction assumption).] 

22. Let p(t) = a 0 + a it + a 2 t 2 + t 3 , and let A be a zero of p. 

a. Write the companion matrix for p. 

b. Explain why A 3 = — <3 0 — a\X — a 2 X 2 , and show that 
( 1 , A, A 2 ) is an eigenvector of the companion matrix for 

P- 

23. Let p be the polynomial in Exercise 22, and suppose the 
equation pit) = 0 has distinct roots X\ , A 2 , A 3 . Let V be the 
Vandermonde matrix 



1 

A3 

At 


(The transpose of V was considered in Supplementary Ex¬ 
ercise 11 in Chapter 2.) Use Exercise 22 and a theorem from 
this chapter to deduce that V is invertible (but do not compute 
V~ l ). Then explain why V~ l C p V is a diagonal matrix. 

24. [M] The MATLAB command roots (p) computes the 
roots of the polynomial equation pit) = 0. Read a MATLAB 
manual, and then describe the basic idea behind the algorithm 
for the roots command. 


25. [M] Use a matrix program to diagonalize 



"-3 

-2 

0 

A = 

14 

7 

-1 


-6 

-3 

1 


if possible. Use the eigenvalue command to create the diag¬ 
onal matrix D . If the program has a command that produces 
eigenvectors, use it to create an invertible matrix P. Then 
compute AP — PD and PDP~ l . Discuss your results. 


26. [M] Repeat Exercise 25 for A = 



5 

2 

-8 

-2 

































Orthogonality and 
Least Squares 


INTRODUCTORY EXAMPLE 

The North American Datum 
and GPS Navigation 



Imagine starting a massive project that you estimate will 
take ten years and require the efforts of scores of people 
to construct and solve a l,800,000-by-900,000 system 
of linear equations. That is exactly what the National 
Geodetic Survey did in 1974, when it set out to update 
the North American Datum (NAD) — a network of 268,000 
precisely located reference points that span the entire North 
American continent, together with Greenland, Hawaii, the 
Virgin Islands, Puerto Rico, and other Caribbean islands. 

The recorded latitudes and longitudes in the NAD 
must be determined to within a few centimeters because 
they form the basis for all surveys, maps, legal property 
boundaries, and layouts of civil engineering projects 
such as highways and public utility lines. However, 
more than 200,000 new points had been added to the 
datum since the last adjustment in 1927, and errors had 
gradually accumulated over the years, due to imprecise 
measurements and shifts in the earth’s crust. Data gathering 
for the NAD readjustment was completed in 1983. 

The system of equations for the NAD had no solution 
in the ordinary sense, but rather had a least-squares 
solution , which assigned latitudes and longitudes to the 
reference points in a way that corresponded best to the 
1.8 million observations. The least-squares solution was 
found in 1986 by solving a related system of so-called 


normal equations , which involved 928,735 equations in 
928,735 variables. 1 

More recently, knowledge of reference points on the 
ground has become crucial for accurately determining 
the locations of satellites in the satellite-based Global 
Positioning System (GPS). A GPS satellite calculates its 
position relative to the earth by measuring the time it takes 
for signals to arrive from three ground transmitters. To do 
this, the satellites use precise atomic clocks that have been 
synchronized with ground stations (whose locations are 
known accurately because of the NAD). 

The Global Positioning System is used both for 
determining the locations of new reference points on the 
ground and for finding a user’s position on the ground 
relative to established maps. When a car driver (or a 
mountain climber) turns on a GPS receiver, the receiver 
measures the relative arrival times of signals from at 
least three satellites. This information, together with the 
transmitted data about the satellites’ locations and message 
times, is used to adjust the GPS receiver’s time and to 
determine its approximate location on the earth. Given 
information from a fourth satellite, the GPS receiver can 
even establish its approximate altitude. 

1 A mathematical discussion of the solution strategy (along with details 
of the entire NAD project) appears in North American Datum of 1983, 
Charles R. Schwarz (ed.), National Geodetic Survey, National Oceanic 
and Atmospheric Administration (NOAA) Professional Paper NOS 2, 
1989. 
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Both the NAD and GPS problems are solved by finding 
a vector that “approximately satisfies” an inconsistent 
system of equations. A careful explanation of this apparent 


contradiction will require ideas developed in the first five 
sections of this chapter. 


WEB 


In order to find an approximate solution to an inconsistent system of equations that has 
no actual solution, a well-defined notion of nearness is needed. Section 6.1 introduces 
the concepts of distance and orthogonality in a vector space. Sections 6.2 and 6.3 show 
how orthogonality can be used to identify the point within a subspace W that is nearest 
to a point y lying outside of W. By taking W to be the column space of a matrix, 
Section 6.5 develops a method for producing approximate (“least-squares”) solutions 
for inconsistent linear systems, such as the system solved for the NAD report. 

Section 6.4 provides another opportunity to see orthogonal projections at work, 
creating a matrix factorization widely used in numerical linear algebra. The remaining 
sections examine some of the many least-squares problems that arise in applications, 
including those in vector spaces more general than R ;? . 


6.1 INNER PRODUCT, LENGTH, AND ORTHOGONALITY 


Geometric concepts of length, distance, and perpendicularity, which are well known for 
R 2 and R 3 , are defined here for R ;? . These concepts provide powerful geometric tools for 
solving many applied problems, including the least-squares problems mentioned above. 
All three notions are defined in terms of the inner product of two vectors. 


The Inner Product 

If u and v are vectors in R”, then we regard u and v as n x 1 matrices. The transpose 
u r is a 1 x n matrix, and the matrix product u r v is a 1 x 1 matrix, which we write as 
a single real number (a scalar) without brackets. The number u r v is called the inner 
product of u and v, and often it is written as u-v. This inner product, mentioned in the 
exercises for Section 2.1, is also referred to as a dot product. If 



U\ 



u = 

U 2 

• 

• 

and v = 

v 2 

• 

• 


• 

_ _ 


• 

_ V n _ 


then the inner product of u and v is 


[u i u 2 • 


u n ] 


Vl 

v 2 


V 


n 


U1V1 + u 2 v 2 H-b U„v n 
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EXAMPLE 1 

SOLUTION 



2" 


3" 

Compute u • v and v • u for u = 

-5 

-1 

and v = 

2 

-3 


u-v = u r v 


[2 -5 -1] 


3 

2 

3 


(2) (3) + (—5) (2) + (—!)(—3) 


1 


v-u = v r u = [ 3 2 -3] 


2 

5 

1 


(3) (2) + (2) (—5) + (—3)(—1) 


1 


It is clear from the calculations in Example 1 why u*v = v-u. This commutativity 
of the inner product holds in general. The following properties of the inner product are 
easily deduced from properties of the transpose operation in Section 2.1. (See Exer¬ 
cises 21 and 22 at the end of this section.) 


THEOREM 1 


Let u, v, and w be vectors in M /? , and let c be a scalar. Then 


a. u*v = v«u 

b. (u + v)-w = u-w + v-w 

c. (cu)*v = c(u-v) = u-(cv) 

d. u*u > 0, and u-u = 0 if and only if u = 0 


Properties (b) and (c) can be combined several times to produce the following useful 

rule: 

(C1U1 H - b c^u^-w = Ci(ui-w) H - b Cp(Up-w) 


The Length of a Vector 

If v is in R", with entries V \,..., v n , then the square root of v-v is defined because v-v 
is nonnegative. 


DEFINITION 


The length (or norm) of v is the nonnegative scalar 


defined by 


^/y• 



v\ + v\ -b v%, and 


V • V 



FIGURE 1 

Interpretation of 


as length. 


. If we identify v with a geometric point in the 


CL 

Suppose v is in R 2 , say, v = ^ 

plane, as usual, then ||v|| coincides with the standard notion of the length of the line 
segment from the origin to v. This follows from the Pythagorean Theorem applied to a 
triangle such as the one in Figure 1. 

A similar calculation with the diagonal of a rectangular box shows that the definition 
of length of a vector v in M 3 coincides with the usual notion of length. 

For any scalar c, the length of cx is |c| times the length of v. That is, 




(To see this, compute ||cv 


= (cv)- (cv) = c 2 v-v = c 2 



and take square roots.) 
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A vector whose length is 1 is called a unit vector. If we divide a nonzero vector v 
by its length—that is, multiply by 1/1| v|| — we obtain a unit vector u because the length 
of u is (1 /1| v||) || v ||. The process of creating u from v is sometimes called normalizing 
v, and we say that u is in the same direction as v. 

Several examples that follow use the space-saving notation for (column) vectors. 


EXAMPLE 2 Let v = (1, —2, 2, 0). Find a unit vector u in the same direction as v. 





FIGURE 2 

Normalizing a vector to produce a 
unit vector. 


SOLUTION First, compute the length of v: 


Then, multiply v by 1/ 


To check that 


u 


,|| 2 =vv= (l) 2 + (- 

2) 2 + (2) 2 

+ ( 0) 2 = 

v = a/9 = 3 




/ v to obtain 

1 " 


r 1/3 1 

1 1 1 

-2 


-2/3 

u = - v = -v = - 

v 3 3 

2 


2/3 


0 


0 

1 , it suffices to show that || u || 

2 _ 

1 . 

u 2 = u.u = ( I ) 2 + H) 2 + ( l ) 2 + ( 0 ) 

=-+-+-+0 

9 1 9 1 9 1 w 

= 1 




9 


EXAMPLE 3 


Let W be the subspace of M 2 spanned by x = (|, 1). Find a unit vector 


z that is a basis for W . 


SOLUTION W consists of all multiples of x, as in Figure 2(a). Any nonzero vector in 
IF is a basis for W . To simplify the calculation, “scale” x to eliminate fractions. That 
is, multiply x by 3 to get 


Now compute ||y 


2 2 + 3 




r 2 " 


y = 

3 


13, : 

Y = 

= V 13 , and no 

1 

"2" 


"2/VI3' 

Vl3 

3 


3/>/l3 


See Figure 2(b). Another unit vector is (— 2/a/ 13, — 3/\/l3). 


Distance in 

We are ready now to describe how close one vector is to another. Recall that if a and 
h are real numbers, the distance on the number line between a and b is the number 
a — b |. Two examples are shown in Figure 3. This definition of distance in M has a 
direct analogue in W 1 . 


a b 


123456789 

6 units apart 

<-> 


a b 


-3 -2 -1 0 1 2 3 4 5 

7 units apart 

<-» 


12-81 = 1-61 = 6 or 18-21 = 161 = 6 


l(-3) - 41 = 1-71 = 7 or 14 - (-3)1 = 171 = 7 


FIGURE 3 Distances in R. 
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DEFINITION 


For u and v in W 1 , the distance between u and v, written as dist(u,v), is the 
length of the vector u — v. That is, 


dist(u, v) = 




In R 2 and R 3 , this definition of distance coincides with the usual formulas for the 
Euclidean distance between two points, as the next two examples show. 


EXAMPLE 4 Compute the distance between the vectors u = (7,1) and v = (3,2). 


SOLUTION Calculate 




= s/A 2 + (-1) 2 = vT7 


The vectors u, v, and u — v are shown in Figure 4. When the vector u — v is added 
to v, the result is u. Notice that the parallelogram in Figure 4 shows that the distance 
from u to v is the same as the distance from u — v to 0. ■ 



FIGURE 4 The distance between u and v is 
the length of u — v. 


EXAMPLES If u = (u\ , u 2 , u 3 ) and v = (tq, v 2 , u 3 ), then 


dist(u, v) 


u 


y/(u ~ V) • (U — V) 


V(u 1 - Vi ) 2 + ( u 2 - v 2 ) 2 + ( u 3 - v 3 ) 2 



Orthogonal Vectors 


The rest of this chapter depends on the fact that the concept of perpendicular lines in 
ordinary Euclidean geometry has an analogue in R 77 . 

Consider R 2 or R 3 and two lines through the origin determined by vectors u and 
v. The two lines shown in Figure 5 are geometrically perpendicular if and only if the 
distance from u to v is the same as the distance from u to — v. This is the same as 
requiring the squares of the distances to be the same. Now 

[ dist(u, —v) ] 2 = ||u — (—v)|| 2 = ||u + v|| 2 

= (u + v) • (u + v) 

= u • (u + v) + v • (u + v) Theorem 1 (b) 


u-u + u*v + v-u + v-v 


u 


+ 


+ 2u«v 


Theorem 1(a), (b) 
Theorem 1(a) 
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The same calculations with v and — v interchanged show that 

[dist(u, v)] 2 = 11u11 2 + || - v11 2 + 2u- (-v) 

= ||u|| 2 + ||v|| 2 — 2u-v 

The two squared distances are equal if and only if 2u-v = — 2u*v, which happens if and 
only if u*v = 0. 

This calculation shows that when vectors u and v are identified with geometric 
points, the corresponding lines through the points and the origin are perpendicular if 
and only if u-v = 0. The following definition generalizes to M 7? this notion of perpen¬ 
dicularity (or orthogonality , as it is commonly called in linear algebra). 


DEFINITION 


Two vectors u and v in R /? are orthogonal (to each other) if u • v 



T 

Observe that the zero vector is orthogonal to every vector in W 1 because 0 v = 0 
for all v. 

The next theorem provides a useful fact about orthogonal vectors. The proof follows 
immediately from the calculation in (1) above and the definition of orthogonality. The 
right triangle shown in Figure 6 provides a visualization of the lengths that appear in 
the theorem. 


THEOREM 2 


u + v 



FIGURE 6 


The Pythagorean Theorem 


Two vectors u and v are orthogonal if and only if 


u + v 







Orthogonal Complements 

To provide practice using inner products, we introduce a concept here that will be of use 
in Section 6.3 and elsewhere in the chapter. If a vector z is orthogonal to every vector 
in a subspace W of R”, then z is said to be orthogonal to W . The set of all vectors z 
that are orthogonal to W is called the orthogonal complement of W and is denoted by 
W 1 - (and read as “IT perpendicular” or simply “IT perp”). 



FIGURE 7 

A plane and line through 0 as 
orthogonal complements. 


EXAMPLE 6 Let IT be a plane through the origin in M 3 , and let L be the line 
through the origin and perpendicular to IT. If z and w are nonzero, z is on L, and w 
is in IT, then the line segment from 0 to z is perpendicular to the line segment from 0 to 
w; that is, z-w = 0. See Figure 7. So each vector on L is orthogonal to every w in IT. 
In fact, L consists of all vectors that are orthogonal to the w’s in IT, and IT consists of 
all vectors orthogonal to the z’s in L. That is, 

L = W 1 and W = L 1 ■ 

The following two facts about , with W a subspace of R", are needed later 
in the chapter. Proofs are suggested in Exercises 29 and 30. Exercises 27-31 provide 
excellent practice using properties of the inner product. 


1. A vector x is in W 1 - if and only if x is orthogonal to every vector in a set that 
spans IT. 

2. W 1 - is a subspace of M /? . 


The next theorem and Exercise 31 verify the claims made in Section 4.6 concerning 
the subspaces shown in Figure 8. (Also see Exercise 28 in Section 4.6.) 
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THEOREM 3 



FIGURE 8 The fundamental subspaces determined 
by an m x n matrix A. 


Remark: A common way to prove that two sets, say S and T, are equal is to show 
that S is a subset of T and T is a subset of S. The proof of the next theorem that 
Nul A = (Row A) 1 - is established by showing that Nul A is a subset of (Row A) 1 - and 
(Row A) 1 - is a subset of Nul A . That is, an arbitrary element x in Nul A is shown to be in 
(Row A ) 1 -, and then an arbitrary element x in (Row A) 1 - is shown to be in Nul A. 


Let A be an m x n matrix. The orthogonal complement of the row space of A is 
the null space of A , and the orthogonal complement of the column space of A is 
the null space of A T : 

(Row d) 1 = Nul A and (ColA) -1 = Nulv4 r 


PR001 The row-column rule for computing Ax shows that if x is in Nul A, then x is 
orthogonal to each row of A (with the rows treated as vectors in R”). Since the rows 
of A span the row space, x is orthogonal to Row A. Conversely, if x is orthogonal to 
Row A , then x is certainly orthogonal to each row of A , and hence Ax = 0. This proves 
the first statement of the theorem. Since this statement is true for any matrix, it is true 
for A t . That is, the orthogonal complement of the row space of A T is the null space of 
A t . This proves the second statement, because Row A t = Col A . ■ 


Angles in R 2 and R 3 (Optional) 

If u and v are nonzero vectors in either R 2 or R 3 , then there is a nice connection between 
their inner product and the angle ft between the two line segments from the origin to the 
points identified with u and v. The formula is 


u-v = 




cos ft 



To verify this formula for vectors in R 2 , consider the triangle shown in Figure 9, with 


sides of lengths ||u 


, and ||u — v||. By the law of cosines, 


u 


u 


+ 


2||u 


COS ft 



FIGURE 9 The angle between two vectors. 
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which can be rearranged to produce 


u 


cos d 


u 


+ 


u 


] 


1 r 

2 L 

- \u\ + u\ + v\ + v\ - (; U\ 

u\Vi + U 2 V 2 


Vi) 2 - (u 2 - V 2 ) 2 ] 


u-v 


The verification for M 3 is similar. When n > 3, formula (2) may be used to define the 
angle between two vectors in M 77 . In statistics, for instance, the value of cos d defined 
by (2) for suitable vectors u and v is what statisticians call a correlation coefficient. 


PRACTICE PROBLEMS 


. Compute-and 

a-a 



4/3 


5 

2. Let c = 

-1 

and d = 

6 


2/3 


-1 


1. Let a 


2 

1 


and b 


3 

1 



a. Find a unit vector u in the direction of c. 

b. Show that d is orthogonal to c. 

c. Use the results of (a) and (b) to explain why d must be orthogonal to the unit 
vector u. 

3. Let W be a subspace of R 77 . Exercise 30 establishes that W 1 - is also a subspace of 
R 77 . Prove that dim W + dim W 1 - = n . 


6.1 EXERCISES 


Compute the quantities in Exercises 1-8 using the vectors 






r 31 


6" 

-1 

2 

, v = 

i 

i 

, w = 

i 

Ol i ' 

1_ 

, X = 

-2 

3 



u*u, v*u, and 


v*u 

u*u 




1 


w 


w*w 

U‘V 

y.y 





2 . 

4. 


x*w 

w*w, x*w, and- 

ww 

1 

-u 

U‘U 



X-w 


X • X 





In Exercises 9-12, find a unit vector in the direction of the given 
vector. 



13. Find the distance between x 


10 

-3 


and y = 




0 " 


~-4“ 

14. Find the distance between u = 

-5 

2 

and z = 

-1 

8 


Determine which pairs of vectors in Exercises 15-18 are orthog¬ 
onal. 


15. a = 



17. u = 



16. u = 


18. y = 


" 12" 


1 

<N 

1_ 

3 

, V = 

-3 

_ —5 _ 


i 

m 

_ 1 

~-3“ 


1 

7 

rw _ 

00 

4 

5 z — 

15 

i 

o 

i _ 


_ — 7 _ 


In Exercises 19 and 20, all vectors are in W 1 . Mark each statement 
True or False. Justify each answer. 


19. a. v*v = ||v|| . 

b. For any scalar c, u* (cv) = c(u*v). 

c. If the distance from u to v equals the distance from u to 
—y, then u and v are orthogonal. 

d. For a square matrix A, vectors in Col A are orthogonal to 
vectors in Nul A . 
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21 


22 


27 


28 


If vectors Vi,..., \ p span a subspace W and if x is 
orthogonal to each v, for j = l,, p, then x is in W ± . 


20. a. u*v — v*u = 0. 


= c 


b. For any scalar c, || cy 

c. If x is orthogonal to every vector in a subspace W , then x 
is in W ± . 

d. If ||u|| 2 + ||v|| 2 = ||u + v|| 2 , then u and v are orthogonal. 

e. For an m x n matrix A, vectors in the null space of A are 
orthogonal to vectors in the row space of A . 

Use the transpose definition of the inner product to verify 
parts (b) and (c) of Theorem 1. Mention the appropriate facts 
from Chapter 2. 

Let u = (wi, u 2 , u 3 ). Explain why u*u > 0. When is 

u*u = 0? 



2 " 


~-7“ 

23. Let u = 

-5 

-1 

and v = 

-4 

6 


. Compute and compare 


u-v, ||u|| 
Theorem. 


v|| 2 , and ||u + v|| 2 . Do not use the Pythagorean 


24. Verify the parallelogram law for vectors u and v in R 77 : 


u + v|| z + ||u — v|| 2 = 2||u 2 + 2 v 


25. Let v = 


a 

b 


. Describe the set H of vectors 


x 

y 


that are 


orthogonal to v. [Hint: Consider v = 0 and v/OJ 


26. Let u = 


5 

6 
7 


, and let W be the set of all x in R 3 such that 


u*x = 0. What theorem in Chapter 4 can be used to show that 
IF is a subspace of IR 3 ? Describe IF in geometric language. 


Suppose a vector y is orthogonal to vectors u and v. Show 
that y is orthogonal to the vector u + v. 

Suppose y is orthogonal to u and v. Show that y is orthogonal 
to every w in Span {u, v}. [Hint: An arbitrary w in Span {u, v} 
has the form w = CiU + c 2 y. Show that y is orthogonal to 
such a vector w.] 



Span{u, v} 


29. Let IF = Span{vi,..., v^}. Show that if x is orthogonal to 
each \j , for 1 < j < p , then x is orthogonal to every vector 
in IF. 


30. Let IF be a subspace of R 77 , and let IF 2- be the set of all 
vectors orthogonal to IF. Show that IF 2 - is a subspace of IR' 7 
using the following steps. 

a. Take z in W , and let u represent any element of IF. Then 
z*u = 0. Take any scalar c and show that cz is orthogonal 
to u. (Since u was an arbitrary element of IF, this will 
show that cz is in W ± .) 

b. Take z { and z 2 in IF 2- , and let u be any element of 
IF. Show that z { + z 2 is orthogonal to u. What can you 
conclude about z { + z 2 ? Why? 

c. Finish the proof that IF ^ is a subspace of IR 77 . 


31. Show that if x is in both IF and IF 2- , then x = 0. 


32. [M] Construct a pair u, v of random vectors in IR 4 , and let 




a. Denote the columns of A by ai,...,a 4 . Compute 
the length of each column, and compute ai*a 2 , 
ai • a 3 , ai • a 4 , a 2 • a 3 , a 2 • a 4 , and a 3 • a 4 . 

b. Compute and compare the lengths of u, Au, v, and Ay. 

c. Use equation (2) in this section to compute the cosine of 
the angle between u and v. Compare this with the cosine 
of the angle between An and Ay. 

d. Repeat parts (b) and (c) for two other pairs of random 
vectors. What do you conjecture about the effect of A on 
vectors? 


33. [M] Generate random vectors x, y, and v in IR 4 with integer 
entries (and v^0), and compute the quantities 




(x + y)*v 
v,-V, 

V • V 


(10x)*v 

-V 

V • V 


Repeat the computations with new random vectors x and 
y. What do you conjecture about the mapping x T(x) = 



v (for v / 0)? Verify your conjecture algebraically. 



[M] Let A = 


-6 

3 

-27 

-33 

-13 

6 

-5 

25 

28 

14 

8 

-6 

34 

38 

18 

12 

-10 

50 

41 

23 

14 

-21 

49 

29 

33 


. Construct 


a matrix N whose columns form a basis for NulA, and 
construct a matrix R whose rows form a basis for Row A (see 
Section 4.6 for details). Perform a matrix computation with 
N and R that illustrates a fact from Theorem 3. 
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SOLUTIONS TO PRACTICE PROBLEMS 



a-b = 7, a-a = 5. Hence 


a-b 

a-a 



and 

5 



2. a. Scale c, multiplying by 3 to get y = 



.Compute ||y 



29 and 


y 



The unit vector in the direction of both c and y is u = 



4/V29 

-3/V29 

2/V29 


b. d is orthogonal to c, because 



5 


4/3 

d-c = 

6 

• 

-1 


-1 


2/3 



c. d is orthogonal to u, because u has the form kc for some k , and 


d-u = d- (kc) = k(d-c) = k( 0) = 0 


3. If W ^ {0}, let {bi ,..., b^} be a basis for IT, where 1 < p < n. Let A be the p x n 
matrix having rows b{ , ..., b p . It follows that W is the row space of A. Theorem 

3 implies that W 1 - = (Row A) 1 - = Nul A and hence dim W 1 - = dim Nul A. Thus, 
dim W + dim W 1 - = dim Row A-\- dim Nul A = rank A + dim Nul A = n , by the 
Rank Theorem. If W = { 0 }, then W 1 - = W 1 , and the result follows. 


6.2 ORTHOGONAL SETS 



FIGURE 1 


A set of vectors {ui,..., u^} in E 77 is said to be an orthogonal set if each pair of distinct 
vectors from the set is orthogonal, that is, if U/ «u y =0 whenever i ^ j . 

EXAMPLE 1 Show that {ui, 112,113} is an orthogonal set, where 


ui = 

"3" 

1 

, u 2 = 

"-I" 

2 

, u 3 = 

”— 1/2 ~ 

-2 


1 


1 


7/2 


SOLUTION Consider the three possible pairs of distinct vectors, namely, {111,112}, 
{iii, u 3 }, and {u 2 ,u 3 }. 


Ui«u 2 = 3(—1) + 1(2) + 1(1) = 0 
Ui -u 3 = 3 (-i) + 1(—2) + 1 (3) = 0 

U 2 *u 3 = —1 (—|) + 2(—2) + 1 (|) = 0 

Each pair of distinct vectors is orthogonal, and so {ui, u 2 , u 3 } is an orthogonal set. See 
Figure 1; the three line segments there are mutually perpendicular. ■ 


THEOREM 4 


If S = {ui,... ,u p } is an orthogonal set of nonzero vectors in R 77 , then S is 
linearly independent and hence is a basis for the subspace spanned by S . 
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DEFINITION 


THEOREM 5 


PROOF IfO = C 1 U 1 + • • • + CpUp for some scalars c \,..., c p , then 

0 = 0-ui = (ciUi + c 2 u 2 H-b cpU.p )*Ui 

= (CiUi)-Ui + (c 2 u 2 )-ui H-b (c p XXp)-XXi 

= ci(ui-Ui) + c 2 (u 2 *ui) H-b cp(\ip* Ui) 

= Cl(Ui-Ui) 

because Ui is orthogonal to u 2 ,..., u^. Since Ui is nonzero, Ui -Ui is not zero and so 
C\ = 0. Similarly, c 2 ,... ,c p must be zero. Thus S is linearly independent. ■ 


An orthogonal basis for a subspace W of M /? is a basis for W that is also an 
orthogonal set. 


The next theorem suggests why an orthogonal basis is much nicer than other bases. 
The weights in a linear combination can be computed easily. 


Let {ui,..., u^} be an orthogonal basis for a subspace W of W 1 . For each y in W , 
the weights in the linear combination 


are given by 


y = ciui H-b CpUp 





PROOl As in the preceding proof, the orthogonality of {ui,..., u^} shows that 

y-Ui = (ciUi + c 2 u 2 H -b Cp\Xp)-xxi = c x (ui • u x ) 

Since Ui-Ui is not zero, the equation above can be solved for c x . To find Cj for 
j =2 compute y-u 7 and solve for cj. ■ 


EXAMPLE 2 The set S = {ui, u 2 , 113 } in Example 1 is an orthogonal basis for M 3 

6"" 


Express the vector y 


1 

8 


as a linear combination of the vectors in S 


SOLUTION Compute 


By Theorem 5, 


y-ui = 11, 

yu 2 

= -12, 

yu 3 = 

: -33 

Ui- Ui = 11, 

u 2 - u 2 

= 6, 

u 3 • u 3 = 

: 33/2 

y-ui 

y = 

-ui + - 

y*u2 

U 2 + 

yu 3 

u 3 



ui-iii 


u 2 - u 2 


u 3 • ll 3 


11 

IT 


Ul + 


12 


6 


u 2 + 


33 


33/2 


u 3 


ui — 2 u 2 — 2 u 


Notice how easy it is to compute the weights needed to build y from an orthogonal 
basis. If the basis were not orthogonal, it would be necessary to solve a system of linear 
equations in order to find the weights, as in Chapter 1. 

We turn next to a construction that will become a key step in many calculations 
involving orthogonality, and it will lead to a geometric interpretation of Theorem 5. 
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An Orthogonal Projection 



n 


Given a nonzero vector u in , consider the problem of decomposing a vector y in 
into the sum of two vectors, one a multiple of u and the other orthogonal to u. We wish 
to write 


y = y+ z 


(1) 


/V 



FIGURE 2 

Finding a to make y — y 
orthogonal to u. 


where y = cm for some scalar a and z is some vector orthogonal to u. See Figure 2. 
Given any scalar a, let z = y — cm, so that (1) is satisfied. Then y — y is orthogonal to 
u if and only if 


0 = (y — au)-u = y-u — (cm)-u = y-u — Q'(u-u) 


That is, (1) is satisfied with z orthogonal to u if and only if a = 


y-u 

u-u 



y-u 

-u. 

u-u 


The vector y is called the orthogonal projection of y onto u, and the vector z is called 

the component of y orthogonal to u. 

If c is any nonzero scalar and if u is replaced by c u in the definition of y, then the 
orthogonal projection of y onto c u is exactly the same as the orthogonal projection of 
y onto u (Exercise 31). Hence this projection is determined by the subspace L spanned 
by u (the line through u and 0). Sometimes y is denoted by proj L y and is called the 

orthogonal projection of y onto L. That is, 


y = proj L y = 


y-u 

-u 

u-u 



EXAMPLE 3 Lety 


7 

6 


andu 


4 

2 


. Find the orthogonal projection of y onto 


u. Then write y as the sum of two orthogonal vectors, one in Span {u} and one orthogonal 
to u. 


SOLUTION Compute 


y-u 


u-u 


—1 

-a 

_1 


1- 

I_ 

1 

VO 

_1 

• 

—i 

<N 

_i 

1 

1_ 


1- 

1_ 

1 

<N 

_1 

• 

-1 

<N 

_i 


The orthogonal projection of y onto u is 


y 


y-u 


u 


u-u 


40 

20 


u 


2 


— 

40 


— 

20 


"4" 


"8" 

2 


4 


and the component of y orthogonal to u is 


yv 

i 

r- 

i_ 


i 

00 

_1 


r—11 

y-y = 

i 

ov 

i_ 


1 

_1 


—i 

<N 

_i 


The sum of these two vectors is y. That is, 


7 

6 

t 


8 

4 

t 

y 


+ 


-l 
2 

’ t ’ 
(y-y) 


This decomposition of y is illustrated in Figure 3. Note: If the calculations above are 
correct, then {y, y — y} will be an orthogonal set. As a check, compute 


y-(y-y) 


-1 

oo 

_1 


1— 

1_ 

1 

_1 

• 

-1 

<N 

_i 


8 + 8 = 0 
















































6.2 Orthogonal Sets 


343 



FIGURE 3 The orthogonal projection of y onto a 
line L through the origin. 


Since the line segment in Figure 3 between y and y is perpendicular to L, by con¬ 
struction of y, the point identified with y is the closest point of L to y. (This can be proved 
from geometry. We will assume this for R 2 now and prove it for R /? in Section 6.3.) 

EXAM PLE 4 Find the distance in Figure 3 from y to L. 

SOLUTION The distance from y to L is the length of the perpendicular line segment 
from y to the orthogonal projection y. This length equals the length of y — y. Thus the 
distance is 

lly-yll = yji-1) 2 + 2 2 = Vs ■ 


A Geometric Interpretation of Theorem 5 


The formula for the orthogonal projection y in (2) has the same appearance as each of the 
terms in Theorem 5. Thus Theorem 5 decomposes a vector y into a sum of orthogonal 
projections onto one-dimensional subspaces. 

It is easy to visualize the case in which W = R 2 = Span {ui, 112 }, with ui and 112 
orthogonal. Any y in R 2 can be written in the form 


yui yu 2 
y = -ui H-u 2 

Ui-ui u 2 • u 2 



The first term in (3) is the projection of y onto the subspace spanned by Ui (the line 
through ui and the origin), and the second term is the projection of y onto the subspace 
spanned by u 2 . Thus (3) expresses y as the sum of its projections onto the (orthogonal) 
axes determined by Ui and u 2 . See Figure 4. 



FIGURE 4 A vector decomposed into 
the sum of two projections. 
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Theorem 5 decomposes each y in Span {ui, ...,u p } into the sum of p projections 
onto one-dimensional subspaces that are mutually orthogonal. 

Decomposing a Force into Component Forces 

The decomposition in Figure 4 can occur in physics when some sort of force is applied to 
an object. Choosing an appropriate coordinate system allows the force to be represented 
by a vector y in E 2 or E 3 . Often the problem involves some particular direction of 
interest, which is represented by another vector u. For instance, if the object is moving 
in a straight line when the force is applied, the vector u might point in the direction of 
movement, as in Figure 5. A key step in the problem is to decompose the force into 
a component in the direction of u and a component orthogonal to u. The calculations 
would be analogous to those made in Example 3 above. 



FIGURE 5 



Orthonormal Sets 

A set {ui,..., Up} is an orthonormal set if it is an orthogonal set of unit vectors. If W 
is the subspace spanned by such a set, then {ui,..., u^} is an orthonormal basis for 
W, since the set is automatically linearly independent, by Theorem 4. 

The simplest example of an orthonormal set is the standard basis {ei,..., e„} for 
E 77 . Any nonempty subset of {ei,..., e 77 } is orthonormal, too. Here is a more compli¬ 
cated example. 

EXAMPLE 5 Show that {vi , V 2 , V 3 } is an orthonormal basis of E 3 , where 



3/711 


-1/76 


-1/766 

Vi = 

l/TTT 

, v 2 = 

2/76 

, v 3 = 

-4/766 


l/TTT 


1/76 


7/766 


SOLUTION Compute 

v r v 2 = -3/766 + 2/766 + 1/766 = 0 

v r v 3 = -3/7726-4/7726 + 7/7726 = 0 
v 2 • v 3 = 1/7396 - 8/ 7396 + 7/ 7396 = 0 

Thus {vi , V2, V3} is an orthogonal set. Also, 

Vi-Vi = 9/11 + 1/11 + 1/11 = 1 

V2*V2 = 1/6 + 4/6 + 1/6= 1 
V3.V3 = 1/66 + 16/66 + 49/66 = 1 

which shows that Vi, V 2 , and V 3 are unit vectors. Thus {vi, V 2 , V 3 } is an orthonormal set. 
Since the set is linearly independent, its three vectors form a basis for E 3 . See Figure 6. 


FIGURE 6 
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When the vectors in an orthogonal set of nonzero vectors are normalized to have 
unit length, the new vectors will still be orthogonal, and hence the new set will be 
an orthonormal set. See Exercise 32. It is easy to check that the vectors in Figure 6 
(Example 5) are simply the unit vectors in the directions of the vectors in Figure 1 
(Example 1). 

Matrices whose columns form an orthonormal set are important in applications and 
in computer algorithms for matrix computations. Their main properties are given in 
Theorems 6 and 7. 


THEOREM 6 


An m x n matrix U has orthonormal columns if and only if U T U = I . 


PROOF To simplify notation, we suppose that U has only three columns, each a vector 
in M 777 . The proof of the general case is essentially the same. Let U = [ui u 2 113 ] 
and compute 


U T U 


r t ~ 


u[ui 

u[u 2 

u[u 3 

T 

u' 

[ui U 2 U 3 ] = 

u|ui 

u|u 2 

u[u 3 

T 

L U 3 J 


_U 3 r U i 

u[u 2 

u 3 r u 3 _ 


(4) 


The entries in the matrix at the right are inner products, using transpose notation. The 
columns of U are orthogonal if and only if 


T T 

Ul U 2 = u 2 Ui 


0, 


T T 

Uj U 3 = U 3 Ui 


0, 


T T 

u 2 u 3 = u 3 u 2 


0 


The columns of U all have unit length if and only if 


T 

Ui Ul 


1, 


u[u 2 


1, 


u[u 3 


1 


(5) 


( 6 ) 


The theorem follows immediately from (4)-(6). 


THEOREM 7 


Let XJ be an m x n matrix with orthonormal columns, and let x and y be in M 77 . 
Then 


a. ||E/x|| = ||x|| 

b. (Ux)-(Uy) = x-y 

c. (U x) • (U y) = 0 if and only if x-y = 0 


Properties (a) and (c) say that the linear mapping x ^ Ex preserves lengths and or¬ 
thogonality. These properties are crucial for many computer algorithms. See Exercise 25 
for the proof of Theorem 7. 


EXAMPLE 6 Let U = 

thonormal columns and 


1 /V 2 

1 /V 2 

0 



and x = 



. Notice that U has or- 


U T U = 


1 /V 2 

2/3 


1/V2 

-2/3 



1/V2 

1/V2 

0 




Verify that \\Ux\\ 
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SOLUTION 


Ux = 

1<N |CS 

1 _ 

2 / 3 " 

- 2/3 


'vr 

3 _ 

— 

3 

-1 


0 

1 / 3 . 



1 


|| t/x || = V9 + i + i = vTT 




V2 + 9 = 



Theorems 6 and 7 are particularly useful when applied to square matrices. An 
orthogonal matrix is a square invertible matrix U such that U~ l = U T . By Theo¬ 
rem 6, such a matrix has orthonormal columns. 1 It is easy to see that any square matrix 
with orthonormal columns is an orthogonal matrix. Surprisingly, such a matrix must 
have orthonormal rows , too. See Exercises 27 and 28. Orthogonal matrices will appear 
frequently in Chapter 7. 


EXAMPLE 7 The matrix 



-1/V6 -1/V66 

2/V6 -4/V66 
1/V6 7/V66 


is an orthogonal matrix because it is square and because its columns are orthonormal, 
by Example 5. Verify that the rows are orthonormal, too! ■ 


PRACTICE PROBLEMS 


1. Let ui : 

for M 2 . 


— 1/ a/5 

2/V5_ 


and U 2 


2/y/~5 

1/V5. 


. Show that {ui, U 2 } is an orthonormal basis 


2. Let y and L be as in Example 3 and Figure 3. Compute the orthogonal projection y 

2 

' instead of the u in Example 3. 


of y onto L using u 


1 


3. Let U and x be as in Example 6, and let y 


3V2 

6 


. Verify that Ux-Uy = x-y 


4. Let U be an n x n matrix with orthonormal columns. Show that det U = ±1. 


6.2 EXERCISES 


In Exercises 1-6, determine which sets of vectors are orthogonal. 





In Exercises 7-10, show that {ui, u 2 } or {u 1? u 2 , u 3 } is an orthog¬ 
onal basis for M 2 or IR 3 , respectively. Then express x as a linear 
combination of the u’s. 



1 A better name might be orthonormal matrix , and this term is found in some statistics texts. However, 
orthogonal matrix is the standard term in linear algebra. 
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8. Ui = 


9. Ui = 


3 

1 

1 

0 

1 


,u 2 = 


2 

6 


, and x = 


,u 2 = 


1 

4 

1 


, u 3 = 


10. Ui = 


3 

3 

0 


,u 2 = 


2 

2 

1 


,u 3 = 


-6 

3 

2 
1 

-2 

"1 
1 
4 



8" 

, and x = 

-4 


_ —3 _ 

—| 

5" 

, and x = 

-3 


1 


11. Compute the orthogonal projection of 


1 

7 


onto the line 


through 


-4 

2 


and the origin. 


12. Compute the orthogonal projection of 


1 

-1 


onto the line 


through 


1 

3 


and the origin 


13. Let y = 


2 

3 


and u = 


4 

-7 


. Write y as the sum of two 


orthogonal vectors, one in Span {u} and one orthogonal to u. 


14. Let y = 


2 

6 


and u = 


7 

1 


. Write y as the sum of a vector 


in Span {u} and a vector orthogonal to u. 


15. Lety = 


3 

1 


andu = 


8 

6 


. Compute the distance from y to 


the line through u and the origin. 


16. Let y = 


-3 

9 


and u = 


1 

2 


. Compute the distance from y 


to the line through u and the origin. 


In Exercises 17-22, determine which sets of vectors are orthonor- 
mal. If a set is only orthogonal, normalize the vectors to produce 
an orthonormal set. 



"1/3“ 


'-1/2 " 


"0" 


0" 

17. 

1/3 

5 

0 

18. 

1 

5 

-1 


1/3 


1/2 _ 


0 


0 


19. 

—.6 
.8 _ 

5 

\8" 

.6 

20. 

-2/3 

1/3 

5 

'1/3' 

2/3 




2/3 _ 


0 



■ i/yir 


3/yir 


0 

21. 

3 /V 20 

5 

- 1 /V 20 

5 

- 1 /V 2 




_— 1 /V 20 . 


1 /V 2 


' 1 /vTsT 


1 /V 2 ' 


'- 2 / 3 ' 

22. 

4/a/T8 

5 

0 

9 

1/3 


_ l/vTI _ 


1/V2_ 


_ — 2/3 _ 


b. If y is a linear combination of nonzero vectors from an 
orthogonal set, then the weights in the linear combination 
can be computed without row operations on a matrix. 

c. If the vectors in an orthogonal set of nonzero vectors are 
normalized, then some of the new vectors may not be 
orthogonal. 

d. A matrix with orthonormal columns is an orthogonal 
matrix. 

e. If L is a line through 0 and if y is the orthogonal projection 
of y onto L, then ||y|| gives the distance from y to L. 


24. a. Not every orthogonal set in IR” is linearly independent. 

b. If a set 5 = {ui,..., Up} has the property that u z • u 7 = 0 
whenever i / j , then S is an orthonormal set. 

c. If the columns of an m x n matrix A are orthonormal, then 
the linear mapping x Ax preserves lengths. 

d. The orthogonal projection of y onto v is the same as the 
orthogonal projection of y onto c\ whenever c / 0. 

e. An orthogonal matrix is invertible. 


25. Prove Theorem 7. [Hint: For (a), compute ||Cx|| 2 , or prove 
(b) first.] 


26. Suppose IT is a subspace of W 1 spanned by n nonzero 
orthogonal vectors. Explain why W = M.L 

27. Let U be a square matrix with orthonormal columns. Explain 
why U is invertible. (Mention the theorems you use.) 

28. Let U be an n x n orthogonal matrix. Show that the rows of 
U form an orthonormal basis of W l . 


29. Let U and V be n x n orthogonal matrices. Explain why 
UV is an orthogonal matrix. [That is, explain why UV is 
invertible and its inverse is (UV) T .] 

30. Let U be an orthogonal matrix, and construct V by inter¬ 
changing some of the columns of U. Explain why V is an 
orthogonal matrix. 

31. Show that the orthogonal projection of a vector y onto a line 
L through the origin in M 2 does not depend on the choice 
of the nonzero u in L used in the formula for y. To do this, 
suppose y and u are given and y has been computed by 
formula (2) in this section. Replace u in that formula by cu, 
where c is an unspecified nonzero scalar. Show that the new 
formula gives the same y. 

32. Let {vi, v 2 } be an orthogonal set of nonzero vectors, and let 
c i, c 2 be any nonzero scalars. Show that {cqvi, c 2 v 2 } is also 
an orthogonal set. Since orthogonality of a set is defined in 
terms of pairs of vectors, this shows that if the vectors in 
an orthogonal set are normalized, the new set will still be 
orthogonal. 


In Exercises 23 and 24, all vectors are in 
True or False. Justify each answer. 



. Mark each statement 


33. Given u ^ 0 in R n , let L = Span {u} . Show that the mapping 
x i-> proj L x is a linear transformation. 


23. a. Not every linearly independent set in M.” is an orthogonal 

set. 


34. 


Given u ^ 0 in R n , let L = Span{u}. For y in M" , the 
reflection of y in L is the point refl^ y defined by 
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refl L y = 2‘proj L y-y 

See the figure, which shows that refl^ y is the sum of 
y = proj L y and y — y. Show that the mapping y i-> refE y 
is a linear transformation. 



y-y 


The reflection of y in a line through the origin. 

35. [M] Show that the columns of the matrix A are orthogonal 
by making an appropriate matrix calculation. State the calcu¬ 
lation you use. 



36. [M] In parts (a)-(d), let U be the matrix formed by normal¬ 
izing each column of the matrix A in Exercise 35. 

a. Compute U T U and UU T . How do they differ? 

b. Generate a random vector y in IR 8 , and compute 
p = UU T y and z = y — p. Explain why p is in Cold. 
Verify that z is orthogonal to p. 

c. Verify that z is orthogonal to each column of U . 

d. Notice that y = p + z, with p in Col A. Explain why z is 
in (Cold) -1 -. (The significance of this decomposition of 
y will be explained in the next section.) 



Mastering: Orthogonal 
Basis 6-4 


SOLUTIONS TO PRACTICE PROBLEMS 


1. The vectors are orthogonal because 

ui • u 2 = 

They are unit vectors because 


2/5 + 2/5 = 0 


ui 


u 2 


(-1/V5) 2 + (2/V5) 2 = 1/5 + 4/5=1 
(2/V5) 2 + (1/V5) 2 = 4/5 +1/5=1 


In particular, the set {ui, u 2 } is linearly independent, and hence is a basis for 
there are two vectors in the set. 



since 


2. When y 


7 

6 


and u 


y 


2 

l 

y-u 

u-u 


u 


20 

T 


2 

1 


4 


2 

1 


8 

4 


This is the same y found in Example 3. The orthogonal projection does not seem to 
depend on the u chosen on the line. See Exercise 31. 


3. U y 


1/V2 2/3 

1 /V2 -2/3 


0 


1/3 J 


3V2 


6 


1 

7 

2 


Also, from Example 6, x 


72 

3 


and Ux 


3 

1 

1 


. Hence 


Ux»Uy = 3 + 7 + 2 = 12, and x-y 


6+ 18 = 12 


4. Since U is an n x n matrix with orthonormal columns, by Theorem 6 ,U T U = I. 
Taking the determinant of the left side of this equation, and applying Theorems 5 
and 6 from Section 3.2 results in det U T U = (det U T )( det U) = (det U)( det U) = 
(det U) 2 . Recall det 7 = 1. Putting the two sides of the equation back together results 
in (det U) 2 = 1 and hence det U = ±1. 
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6.3 ORTHOGONAL PROJECTIONS 



The orthogonal projection of a point in M 2 onto a line through the origin has an important 
analogue in M 77 . Given a vector y and a subspace W in M 77 , there is a vector y in W such 
that ( 1 ) y is the unique vector in W for which y — y is orthogonal to W , and ( 2 ) y is 
the unique vector in W closest to y. See Figure 1 . These two properties of y provide the 
key to finding least-squares solutions of linear systems, mentioned in the introductory 
example for this chapter. The full story will be told in Section 6 . 5 . 

To prepare for the first theorem, observe that whenever a vector y is written as a 
linear combination of vectors Ui,..., u n in R 77 , the terms in the sum for y can be grouped 
into two parts so that y can be written as 


y = zi + z 2 


where z\ is a linear combination of some of the u z - and Z2 is a linear combination of 
the rest of the u, . This idea is particularly useful when {ui, ..., u„} is an orthogonal 
basis. Recall from Section 6.1 that W 1 - denotes the set of all vectors orthogonal to a 
subspace W. 


EXAMPLE 1 Let {ui,..., U5} be an orthogonal basis for M 5 and let 


y = c iiii H-b C5U5 


Consider the subspace W = Span {ui ,112}, and write y as the sum of a vector zi in W 
and a vector Z2 in W 1 -. 


SOLUTION Write 


y = C1U1 + c 2 u 2 + C3U3 + C4U4 + C5U5 

^^ ^ 



where Zi = c\U\ + C2U2 is in Span {111,112} 

and z 2 = C3U3 + C4U4 + C5U5 is in Span{u3, 114,115}. 

To show that Z2 is in it suffices to show that Z2 is orthogonal to the vectors in the 
basis {ui, U 2 } for W. (See Section 6 . 1 .) Using properties of the inner product, compute 


Z 2 -Ui = (C3U3 + C4U4 + C 5 U 5 )-Ui 

= C3U3 • Ui + C 4 u 4 - Ui + C 5 U 5 - Ui 

= 0 

because Ui is orthogonal to 113, U4, and u 5 . A similar calculation shows that z 2 -U2 = 0 . 
Thus Z2 is in W~ L . ■ 

The next theorem shows that the decomposition y = zi + Z2 in Example 1 can be 
computed without having an orthogonal basis for M 77 . It is enough to have an orthogonal 
basis only for W . 
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THEOREM 8 


The Orthogonal Decomposition Theorem 

Let W be a subspace of W 1 . Then each y in W 1 can be written uniquely in the form 


y = y + z 


(i) 


where y is in W and z is in W 1 - . In fact, if {ui,..., u^} is any orthogonal basis of 
W , then 

y- Ui 

y = -Ui H - 1 - u v (2) 


, y- u P 

H-—u 


Ui-Ui 




and z = y — y. 


The vector y in (1) is called the orthogonal projection of y onto W and often is 
written as proj^ y. See Figure 2. When IT is a one-dimensional subspace, the formula 
for y matches the formula given in Section 6.2. 




FIGURE 2 The orthogonal projection 
of y onto W . 


PR00I Let {ui,..., u^} be any orthogonal basis for W, and define y by (2). 1 Then y 
is in W because y is a linear combination of the basis Ui,..., . Let z = y — y. Since 

Ui is orthogonal to 112 ,..., , it follows from (2) that 


Z-Ui = (y-y)-ui = yu t - 


( y ' Ul ton 

- ui • ui — 0 —-0 

Vui-uiy 


= y u x -y u! = 0 


Thus z is orthogonal to Ui. Similarly, z is orthogonal to each u 7 in the basis for W. 
Hence z is orthogonal to every vector in W. That is, z is in W 1 -. 

To show that the decomposition in (1) is unique, suppose y can also be written as 
y = fi + zi , with y x in W and z\ in W 1 -. Then y + z = y { + z\ (since both sides equal 
y), and so 

A A 

y-yi = zi -z 

This equality shows that the vector v = y — y x is in W and in W 1 - (because Z\ and z 
are both in W 1 -, and W 1 - is a subspace). Hence v-v = 0, which shows that v = 0. This 
proves that y = y x and also Z\ = z. ■ 


The uniqueness of the decomposition (1) shows that the orthogonal projection y 
depends only on W and not on the particular basis used in (2). 


1 We may assume that W is not the zero subspace, for otherwise W 1 - = and (1) is simply y = 0 + y. 
The next section will show that any nonzero subspace of R" has an orthogonal basis. 











6.3 Orthogonal Projections 351 


EXAMPLE 2 



2" 


~- 2 " 


"1" 


Let ui = 

5 

-1 

,u 2 = 

1 

1 

, and y = 

2 

3 

. Observe that {ui, u 2 } 


is an orthogonal basis for W = Span {ui ,112}. Write y as the sum of a vector in W and 
a vector orthogonal to W . 


SOLUTION The orthogonal projection of y onto W is 

yui yu 2 

y =-u, h -u 2 

Ui-Ui u 2 • u 2 


9 

2" 

3 

"-2" 

9 

2" 

15 

"-2" 


-- 2 / 5 - 


5 

~i- ~ 

1 

— 

5 

I 

1 

— 

2 

30 

-1 

6 

1 

30 

-1 

30 

1 


1/5 


Also 


A 

y-y = 

" 1 " 

2 


”-2/5" 

2 

— 

‘ 7/5 ' 
0 


3 


1/5 _ 


14/5 


Theorem 8 ensures that y — y is in W 1 -. To check the calculations, however, it is a good 
idea to verify that y — y is orthogonal to both Ui and u 2 and hence to all of W. The 
desired decomposition of y is 


y = 

" 1 " 

2 

— 

”- 2 / 5 " 

2 

+ 

■ 7/5 ■ 
0 

■ 


3 


1/5 _ 


14/5 



A Geometric Interpretation of the Orthogonal Projection 

When IT is a one-dimensional subspace, the formula (2) for proj^ y contains just one 
term. Thus, when dim W > 1, each term in (2) is itself an orthogonal projection of y 
onto a one-dimensional subspace spanned by one of the u’s in the basis for IT. Figure 3 
illustrates this when IT is a subspace of M 3 spanned by Ui and u 2 . Here y and y 2 denote 
the projections of y onto the lines spanned by Ui and u 2 , respectively. The orthogonal 
projection y of y onto IT is the sum of the projections of y onto one-dimensional sub¬ 
spaces that are orthogonal to each other. The vector y in Figure 3 corresponds to the 
vector y in Figure 4 of Section 6.2, because now it is y that is in IT. 



FIGURE 3 The orthogonal projection of y is the 
sum of its projections onto one-dimensional 
subspaces that are mutually orthogonal. 
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Properties of Orthogonal Projections 

If {ui, ...,11^} is an orthogonal basis for W and if y happens to be in IT, then the 
formula for proj^ y is exactly the same as the representation of y given in Theorem 5 
in Section 6.2. In this case, proj^ y = y. 

If y is in W = Span {ui,..., u^}, then proj w y = y. 

This fact also follows from the next theorem. 


THEOREM 9 The Best Approximation Theorem 

Let IT be a subspace of W 1 , let y be any vector in W 1 , and let y be the orthogonal 
projection of y onto IT. Then y is the closest point in W to y, in the sense that 

lly — yll < lly-v|| (3) 

for all v in IT distinct from y. 


The vector y in Theorem 9 is called the best approximation to y by elements of IT. 
Later sections in the text will examine problems where a given y must be replaced, or 
approximated , by a vector v in some fixed subspace IT. The distance from y to v, given 
by ||y — v|| , can be regarded as the “error” of using v in place of y. Theorem 9 says that 
this error is minimized when v = y. 

Inequality (3) leads to a new proof that y does not depend on the particular orthogo¬ 
nal basis used to compute it. If a different orthogonal basis for IT were used to construct 
an orthogonal projection of y, then this projection would also be the closest point in IT 
to y, namely, y. 


PROOF Take v in IT distinct from y. See Figure 4. Then y — v is in IT. By the Orthogo¬ 
nal Decomposition Theorem, y — y is orthogonal to IT. In particular, y — y is orthogonal 
to y — v (which is in IT). Since 

y -v = (y-y) + (y-v) 


the Pythagorean Theorem gives 


lly — v 



= lly - yll 2 + lly-v 



(See the colored right triangle in Figure 4. The length of each side is labeled.) Now 


lly 


> 0 because y — v ^ 0, and so inequality (3) follows immediately. 



FIGURE 4 The orthogonal projection 
of y onto IT is the closest point in IT 
toy. 
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THEOREM 10 


WEB 


EXAMPLE 3 If u 


i 


2" 


~-2" 


" 1 " 

5 

, u 2 = 

1 

>y = 

2 

-1 


1 


3 


, and W = Span { 111 , 112 }, 


as in Example 2, then the closest point in W to y is 


y-Ui y-u 2 

y = -ui H-u 2 


ur ui 


u 2 - u 2 


2/5 

2 

1/5 


EXAM PLE 4 The distance from a point y in M 77 to a subspace W is defined as the 
distance from y to the nearest point in W . Find the distance from y to W — Span {ui, 112}, 
where 


y 


—1 

_1 


5 


1— 

1_ 

-5 

, Ui = 

-2 

, u 2 = 

2 

1— 

0 

1_ 


1 

_1 


—1 

_1 


SOLUTION By the Best Approximation Theorem, the distance from y to W is ||y — y ||, 
where y = proj^ y. Since { 111 , 112 } is an orthogonal basis for W, 


y 


15 

30 


ui + 


21 


6 


u 2 


1 

2 


5 

2 

1 


7 

2 


1 

2 

1 


1 

8 

4 


y-y 


1 

_1 


1 

1_ 


1 

0 

1_ 

-5 

— 

00 

— 

3 

1 

0 

1_ 


1 

_1 


1 

_1 


lly - y II 


3 2 + 6 


45 


The distance from y to W is V45 = 3v // 5. 

The final theorem in this section shows how formula (2) for proj^ y is simplified 
when the basis for W is an orthonormal set. 


If {ui, • • •, u^} is an orthonormal basis for a subspace W of M 77 , then 


proj^y = (y-ui)ui + (y-u 2 )u 2 H-b (y-Up)u 


p 


If u = [ Ui U 2 


], then 


(4) 


proj w y = UU T y for all y in M 77 


(5) 


PROOF Formula (4) follows immediately from (2) in Theorem 8. Also, (4) shows 
that proj^y is a linear combination of the columns of U using the weights y-Ui, 
y-U 2 , ..., y*u p . The weights can be written as ufy, U 2 y,...,u^y, showing that they 

are the entries in U T y and justifying (5). ■ 


Suppose U is an n x p matrix with orthonormal columns, and let W be the column 
space of U . Then 


U T U x = I p x = x 

uu 1 y = pi-ojy 


for all x in M 77 


for all y in M 


n 


Theorem 6 
Theorem 10 


If U is an n x n (square) matrix with orthonormal columns, then U is an orthogonal 
matrix, the column space W is all of M 77 , and UU T y = I y = y for all y in M 77 . 

Although formula (4) is important for theoretical purposes, in practice it usually 
involves calculations with square roots of numbers (in the entries of the u z ). Formula 
(2) is recommended for hand calculations. 
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CHAPTER 6 


Orthogonality and Least Squares 


PRACTICE PROBLEMS 



~-7" 


"-I" 


~-9" 


1. Let ui = 

1 

4 

,u 2 = 

1 

—2 

J = 

1 

6 

, and W = Span {ui, u 2 }. Use the fact 


that Ui and 112 are orthogonal to compute proj w y. 


2. Let W be a subspace of M 7? . Let x and y be vectors in W 1 and let z = x + y. If u is 
the projection of x onto W and v is the projection of y onto W, show that u + v is 
the projection of z onto W. 


6.3 EXERCISES 


In Exercises 1 and 2, you may assume that {ui,... ,u 4 } is an 
orthogonal basis for 4 



Ui = 

0" 

1 

-4 

, u 2 = 

"3" 

5 

1 

, U3 = 

1 " 
0 

1 

, u 4 = 

5" 

-3 

-1 


-1 


1 


-4 


1 


1. Ui = 


x = 


10 

-8 

2 

0 


. Write x as the sum of two vectors, one in 


Span {ui, u 2 , u 3 } and the other in Span {u 4 }. 



1 

1 _ 


1 

<N 

1_ 


1 

1_ 


1 

1 _ 


2 


1 


1 


1 

Ui = 

1 

, u 2 = 

-1 

, u 3 = 

-2 

, u 4 = 

1 


1 

_ 1 


1 

_ 1 


1 

_ 1 


1 

CN 

_ 1 


v = 


4 

5 
3 
3 


. Write v as the sum of two vectors, one in 


Span {ui } and the other in Span {u 2 , u 3 , u 4 }. 


In Exercises 3-6, verify that {ui, u 2 } is an orthogonal set, and then 
find the orthogonal projection of y onto Span {ui, u 2 }. 



"-1 " 


" 1 " 


'-1 " 

3. y = 

4 

,Ui = 

1 

,u 2 = 

1 


3_ 


_0_ 


0_ 


6" 


"3" 


- -4" 

4. y = 

3 

,Ui = 

4 

,u 2 = 

3 


-2 


0 


0 



"- 1 " 


3" 


1 " 

5. y = 

2 

6 

= 

-1 

2 

, u 2 = 

-1 

-2 



"6" 


~-4~ 


"0" 

6. y = 

4 

1 

,Ui = 

-1 

1 

,u 2 = 

1 

1 


In Exercises 7-10, let W be the subspace spanned by the u’s, and 
write y as the sum of a vector in W and a vector orthogonal to W. 



1 


1" 


"5" 

7. y = 

1 

Ul LO 

1_ 

,Ui = 

3 

-2 

,u 2 = 

1 

4 



In Exercises 11 and 12, find the closest point to y in the subspace 
W spanned by Vi and v 2 . 







In Exercises 13 and 14, find the best approximation to z by vectors 
of the form ciVi + c 2 v 2 . 



3" 


2“ 


1 " 


-7 


-1 


1 

z = 

2 

, Vi = 

-3 

,v 2 = 

0 


3_ 


1 _ 


_-l_ 


2" 


2“ 


5" 


4 


0 


-2 

z = 

0 

, Vi = 

-1 

,v 2 = 

4 


-1 


-3 


2 



5" 


"-3“ 


- -3" 

15. Let y = 

-9 

5 

,Ui = 

-5 

1 

,u 2 = 

2 

1 


. Find the dis¬ 


tance from y to the plane in R 3 spanned by Ui and u 2 . 


16. Let y, Vi, and v 2 be as in Exercise 12. Find the distance from 
y to the subspace of IR 4 spanned by Vi and v 2 . 
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"4" 


2/3 


-2/3 


17. Let y = 

8 

1 

, Ul = 

1/3 

_ 2/3 _ 

, U2 = 

2/3 

1/3 _ 

, and 


W = Span {ui, u 2 }. 

a. Let U = [ui u 2 ]. Compute U T U and UU T . 

b. Compute proj w y and (UU T ) y. 


18. Let y = 


"7" 


i/VTo' 

9 

» u i — 

_ —3/VT0_ 


, and W = Span {ui}. 


a. Let U be the 2x1 matrix whose only column is Ui 
Compute U T U and U U T . 

b. Compute proj w y and (UU T ) y. 



1 " 


5" 


" 0 " 


19. Let Ui = 

1 

-2 

,u 2 = 

-1 

2 

, and u 3 = 

0 

1 

. Note that 


Ui and u 2 are orthogonal but that u 3 is not orthogonal to Ui or 
u 2 . It can be shown that u 3 is not in the subspace W spanned 
by Ui and u 2 . Use this fact to construct a nonzero vector v in 
R 3 that is orthogonal to Ui and u 2 . 


20. Let Ui and u 2 be as in Exercise 19, and let u 4 



. It can 


be shown that u 4 is not in the subspace W spanned by Ui and 
u 2 . Use this fact to construct a nonzero vector v in R 3 that is 
orthogonal to Ui and u 2 . 


In Exercises 21 and 22, all vectors and subspaces are in R 77 . Mark 
each statement True or False. Justify each answer. 


21. a. If z is orthogonal to Ui and to u 2 and if W — 

Span {ui, u 2 }, then z must be in W ± . 

b. For each y and each subspace IT, the vector y — proj^ y 
is orthogonal to W . 

c. The orthogonal projection y of y onto a subspace W can 
sometimes depend on the orthogonal basis for W used to 
compute y. 

d. If y is in a subspace W , then the orthogonal projection of 
y onto W is y itself. 


e. If the columns of an n x p matrix U are orthonormal, then 
UU T y is the orthogonal projection of y onto the column 
space of U . 

22. a. If IT is a subspace of R 77 and if v is in both IT and W 1 -, 

then v must be the zero vector. 

b. In the Orthogonal Decomposition Theorem, each term in 
formula (2) for y is itself an orthogonal projection of y 
onto a subspace of IT. 

c. If y = Zj + z 2 , where zi is in a subspace IT and z 2 is in 
W 1 -, then zi must be the orthogonal projection of y onto 
IT. 

d. The best approximation to y by elements of a subspace 
IT is given by the vector y — proj^ y. 

e. If an n x p matrix U has orthonormal columns, then 
UU T x = x for all x in R 77 . 


23. Let A be an m x n matrix. Prove that every vector x in R 77 
can be written in the form x = p + u, where p is in Row A 
and u is in Nuld. Also, show that if the equation Ax = b 
is consistent, then there is a unique p in Row A such that 
dp = b. 


24. Let IT be a subspace of R 77 with an orthogonal basis 
{wi ,...,w p }, and let {vi,..., v^} be an orthogonal basis for 

w ± . 


a. Explain why {wi,..., w p , Vi,..., v 9 } is an orthogonal 
set. 

b. Explain why the set in part (a) spans R 77 . 

c. Show that dim IT + dim W 1 - = n . 


25. [M] Let U be the 8 x 4 matrix in Exercise 36 in Section 6.2. 
Find the closest point to y = (1,1,1,1,1,1,1,1) in Col U. 
Write the keystrokes or commands you use to solve this 
problem. 

26. [M] Let U be the matrix in Exercise 25. Find the distance 
from b = (1,1,1,1, —1, —1, —1, —1) to Col U . 


SOLUTION TO PRACTICE PROBLEMS 


1. Compute 


proj w y = 


yui yu 2 

-ui H-u 2 

Ui-ui u 2 • u 2 




In this case, y happens to be a linear combination of Ui and u 2 , so y is in IT. The 
closest point in IT to y is y itself. 

2. Using Theorem 10, let U be a matrix whose columns consist of an orthonormal 
basis for IT. Then prcqV z = UU T z = UU T (x + y) = UU T x + UU T y = 
proj^x + proj w y = u + v. 
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6.4 THE GRAM-SCHMIDT PROCESS 



FIGURE 1 

Construction of an orthogonal 
basis {vi,v 2 }. 


The Gram-Schmidt process is a simple algorithm for producing an orthogonal or 
orthonormal basis for any nonzero subspace of W 1 . The first two examples of the process 
are aimed at hand calculation. 



3 


" 1 " 

EXAMPLE 1 Let W = Span{xi,x 2 }, where xi = 

6 

0 

and x 2 = 

2 

2 


struct an orthogonal basis {vi, V 2 } for W. 


Con- 


SOLUTION The subspace W is shown in Figure 1, along with xi, X 2 , and the projection 
p of x 2 onto Xi. The component of x 2 orthogonal to xi is x 2 — p, which is in W because 
it is formed from x 2 and a multiple of xi. Let Vi = xi and 


X 2 -Xi 

v 2 = x 2 - p = x 2 -Xi = 

Xi-Xi 



3 

6 

0 


0 

0 

2 


Then {vi, v 2 } is an orthogonal set of nonzero vectors in W. Since dim W = 2, the set 
{vi, v 2 } is a basis for W. ■ 


The next example fully illustrates the Gram-Schmidt process. Study it carefully. 


EXAMPLE 2 



"1" 


"0" 


"0" 


1 


1 

, and X 3 = 

0 

Xi = 

1 

, X 2 = 

1 

1 


1 


1 


1 


. Then {xi,x 2 ,X 3 } is 


clearly linearly independent and thus is a basis for a subspace W of R 4 . Construct an 
orthogonal basis for W . 


SOLUTION 


Step 1. Letvi = x\ and W\ = Span{xi} = Span{vi}. 

Step 2. Let v 2 be the vector produced by subtracting from x 2 its projection onto the 
subspace W \. That is, let 


v 2 = x 2 - proj Wl x 2 

x 2 * Vi 

= x 2 -V[ Since Vi = X| 

Vi-Vi 


0 


" 1 " 


■- 3 / 4 ' 

1 

3 

1 


1/4 

1 

~~ 4 

1 


1/4 

1 


1 


1/4 


As in Example 1, v 2 is the component of x 2 orthogonal to xi, and {vi,v 2 } is an 
orthogonal basis for the subspace JT 2 spanned by Xi and x 2 . 

Step 2' (optional). If appropriate, scale v 2 to simplify later computations. Since v 2 has 
fractional entries, it is convenient to scale it by a factor of 4 and replace {vi, v 2 } by the 
orthogonal basis 



1 

1_ 


■-3" 


1 

/ 

1 

Vi = 

1 

, V 2 = 

1 


1 

_1 


—1 

_1 
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Step 3. Let V 3 be the vector produced by subtracting from X 3 its projection onto the 
subspace W 2 . Use the orthogonal basis {vi, v^} to compute this projection onto W 2 : 


Projection of 
x 3 onto Vi 


Projection of 
x 3 onto \' 2 


P r °j w 2 x 3 = 


T 


X 3 - Vi 
-Vi 

vr vi 



T 


1 

1_ 


1 

U) 

_1 


1 

0 

X 3 * ^ 
v'-v' 2 

2 

~~ 4 

1 

_1 

2 

+ 12 

1 

_1 

— 

2/3 

2/3 

2/3 


Then v 3 is the component of x 3 orthogonal to W 2 , namely, 


v 3 = x 3 — proj Wl x 3 = 


0 


0 


0 

0 


2/3 


-2/3 

1 


2/3 


1/3 

1 


2/3 


1/3 


See Figure 2 for a diagram of this construction. Observe that V 3 is in W, because x 3 
and proj^ 2 X 3 are both in W. Thus {vi, \' 2 , V 3 } is an orthogonal set of nonzero vectors 
and hence a linearly independent set in W . Note that W is three-dimensional since it 
was defined by a basis of three vectors. Hence, by the Basis Theorem in Section 4.5, 
{vi , V 2 , V 3 } is an orthogonal basis for W . ■ 



FIGURE 2 The construction of v 3 from 
x 3 and W 2 . 

The proof of the next theorem shows that this strategy really works. Scaling of 
vectors is not mentioned because that is used only to simplify hand calculations. 


THEOREM 11 


The Gram-Schmidt Process 

Given a basis {xi,..., x^} for a nonzero subspace W of M 77 , define 


Vi = 

= Xl 




X 2 - Vi 

v 2 = 

= X 2 - 

Vi 



Vl-Vl 

v 3 = 

= X 3 - 

X 3 • Vi x 3 • v 2 

Vi v 2 


Vi-Vi V2*V 2 


X^-Vi • v 2 

\ p =x p -Vi-V 2 - 

Vi'Vi V 2 -V 2 


Xp-Vp-l 

Vp-l'Vp-1 



Then {vi,..., v^} is an orthogonal basis for W. In addition 


Span {vi,..., Xk } = Span {xi,..., x& } for 1 < k < p 
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WEB 


PR001 For 1 <k< p, let Wk = Span {xi,x^}. Set vi = xi, so that Span {vi} = 
Span {xi}. Suppose, for some k < p, we have constructed vi,..., so that {vi,..., } 

is an orthogonal basis for Wk . Define 

Vk+i = Xjfc +1 - proj Wk X£ +1 (2) 

By the Orthogonal Decomposition Theorem, v^+i is orthogonal to Wk . Note that 
proj w x^ + i is in Wk and hence also in Wk +\. Since x^+i is in Wk+i , so is v^+i (because 
Wk +1 is a subspace and is closed under subtraction). Furthermore, v^+i ^ 0 because 
x^ + i is not in Wk = Span{xi,..., x^}. Hence {vi,..., v^+i} is an orthogonal set of 
nonzero vectors in the (k + 1)-dimensional space Wk +i • By the Basis Theorem in Sec¬ 
tion 4.5, this set is an orthogonal basis for Wk +i • Hence Wk +1 = Span {vi,..., v^+i}. 
When k + 1 = p, the process stops. ■ 

Theorem 11 shows that any nonzero subspace W of M 77 has an orthogonal basis, be¬ 
cause an ordinary basis {xi,..., x^} is always available (by Theorem 11 in Section 4.5), 
and the Gram-Schmidt process depends only on the existence of orthogonal projections 
onto subspaces of W that already have orthogonal bases. 


Orthonormal Bases 

An orthonormal basis is constructed easily from an orthogonal basis {vi,..., v^}: 
simply normalize (i.e., “scale”) all the v^. When working problems by hand, this is 
easier than normalizing each as soon as it is found (because it avoids unnecessary 
writing of square roots). 

EXAMPLE 3 Example 1 constructed the orthogonal basis 



3 


1 — 
o 

1 _ 

Vi = 

6 

, V 2 = 

0 


i 

o 

_i 


—i 
<N 

_ 1 


An orthonormal basis is 



QR Factorization of Matrices 

If an m x n matrix A has linearly independent columns X\,... ,x n , then applying the 
Gram-Schmidt process (with normalizations) to xi,..., x n amounts to factoring A, as 
described in the next theorem. This factorization is widely used in computer algorithms 
for various computations, such as solving equations (discussed in Section 6.5) and 
finding eigenvalues (mentioned in the exercises for Section 5.2). 
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THEOREM 12 The QR Factorization 

If A is an m x n matrix with linearly independent columns, then A can be factored 
as A = QR , where Q is an m x n matrix whose columns form an orthonormal 
basis for Col A and R is an n x/i upper triangular invertible matrix with positive 
entries on its diagonal. 


PROOF The columns of A form a basis {xi,..., x n } for Col A. Construct an orthonor¬ 
mal basis {ui,..., u, 7 } for W = Col A with property (1) in Theorem 11. This basis may 
be constructed by the Gram-Schmidt process or some other means. Let 

Q = [ui u 2 • • • u n ] 

For k = l,... ,n,x k is in Span {xi,..., x^} = Span {ui,..., }. So there are con¬ 

stants, r\k ,..., r kk , such that 


xjfe = ri k m H-b r kk u k + 0-u^+i H-h 0-u* 


We may assume that I'kk > o. (If r kk < 0, multiply both r kk and by — 1.) This shows 
that x k is a linear combination of the columns of Q using as weights the entries in the 
vector 

~ r\ k " 



r k k 

0 



That is, x k = Qr k for k = 1,..., n . Let R = [ ri • • • r n ]. Then 

A = [xi • • • x„ ] = [ Qri ■■■ Qr„ ] = QR 

The fact that R is invertible follows easily from the fact that the columns of A are linearly 
independent (Exercise 19). Since R is clearly upper triangular, its nonnegative diagonal 
entries must be positive. ■ 


EXAMPLE 4 


Find a QR factorization of A = 


1 

1 

1 

1 


0 

1 

1 

1 


0 

0 

1 

1 


SOLUTION The columns of A are the vectors Xi, x 2 , and X3 in Example 2. An 
orthogonal basis for Col A = Span {xi, x 2 , X3} was found in that example: 



To simplify the arithmetic that follows, scale V3 by letting V3 = 3V3. Then normalize 
the three vectors to obtain ui , u 2 , and 113 , and use these vectors as the columns of Q : 



1/2 

-3/VI2 

0 

1/2 

1/V12 

-2/V6 

1/2 

1/VI2 

1/V6 

1/2 

1/VT2 

1/V6 
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By construction, the first k columns of Q are an orthonormal basis of Span {xi,..., x^}. 
From the proof of Theorem 12, A = QR for some R . To find R , observe that Q T Q = I , 
because the columns of 0 are orthonormal. Hence 



Q t A = Q t (QR ) = IR = R 



1/2 1/2 

-3/712 1/712 

0 — 2/76 


1/2 

1/712 

1/76 


'2 3/2 1 

0 3/712 2/712 

0 0 2/76 



NUMERICAL NOTES - 

1. When the Gram-Schmidt process is run on a computer, roundoff error can 
build up as the vectors are calculated, one by one. For j and k large but 
unequal, the inner products uju k may not be sufficiently close to zero. This 
loss of orthogonality can be reduced substantially by rearranging the order 
of the calculations. 1 However, a different computer-based QR factorization is 
usually preferred to this modified Gram-Schmidt method because it yields a 
more accurate orthonormal basis, even though the factorization requires about 
twice as much arithmetic. 

2. To produce a QR factorization of a matrix A, a computer program usually 
left-multiplies A by a sequence of orthogonal matrices until A is transformed 
into an upper triangular matrix. This construction is analogous to the left- 
multiplication by elementary matrices that produces an LU factorization of A . 


PRACTICE PROBLEMS 


1. Let W = Span{xi,X 2 }, where x\ = 
thonormal basis for W . 


" 1 " 


1/3" 

1 

and X 2 = 

1/3 

1 


-2/3 


. Construct an or- 


2. Suppose A = QR , where Q is an m x n matrix with orthogonal columns and R is 
an n x n matrix. Show that if the columns of A are linearly dependent, then R cannot 
be invertible. 


6.4 EXERCISES 


In Exercises 1-6, the given set is a basis for a subspace W. Use 
the Gram-Schmidt process to produce an orthogonal basis for W. 



1 

LO 

_ 1 


1 

oo 

1 _ 


1 

o 

1 _ 


1 

Lh 

_ 1 

1. 

0 

-1 

5 

1 

1 _ 

2. 

1 

CN 

_ 1 

5 

6 

_ — 7 _ 



1 See Fundamentals of Matrix Computations , by David S. Watkins (New York: John Wiley & Sons, 1991), 
pp.167-180. 
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7. Find an orthonormal basis of the subspace spanned by the 
vectors in Exercise 3. 

8. Find an orthonormal basis of the subspace spanned by the 
vectors in Exercise 4. 


Find an orthogonal basis for the column space of each matrix in 


Exercises 9- 

-12. 






9. 

3 

1 

-5 

1 

1 " 
1 

10. 

"-1 

3 

6 

-8 

6" 

3 

-1 

5 

-2 

1 

-2 

6 


3 

-7 

8 _ 


1 

-4 

—3 _ 


1 

2 

5" 


1 

3 

5" 


-1 

1 

-4 


-1 

-3 

1 

11. 

-1 

4 

-3 

12. 

0 

2 

3 


1 

-4 

7 


1 

5 

2 


1 

2 

1 


1 

5 

8 


In Exercises 13 and 14, the columns of Q were obtained by 
applying the Gram-Schmidt process to the columns of A. Find an 
upper triangular matrix R such that A = QR. Check your work. 



5 

9" 


5/6 

-1/6“ 

13. A = 

1 

-3 

7 

-5 

,Q = 

1/6 

-3/6 

5/6 

1/6 


1 

5 _ 


1/6 

3/6 _ 


"-2 

3" 


'-2/7 

5/7“ 

14. A = 

5 

2 

7 

-2 

,Q = 

5/7 

2/7 

2/7 

-4/7 


4 

6 


4/7 

2/7 


15. Find a QR factorization of the matrix in Exercise 11. 

16. Find a QR factorization of the matrix in Exercise 12. 


In Exercises 17 and 18, all vectors and subspaces are in R". Mark 
each statement True or False. Justify each answer. 


17. a. If {vi,v 2 ,v 3 } is an orthogonal basis for W, then mul¬ 

tiplying v 3 by a scalar c gives a new orthogonal basis 

{Vi,V 2 ,CV 3 }. 

b. The Gram-Schmidt process produces from a linearly in¬ 
dependent set {xi,..., x p } an orthogonal set {vi,..., v^} 
with the property that for each k, the vectors Vi,..., 
span the same subspace as that spanned by Xi,..., x*. 

c. If A = QR , where Q has orthonormal columns, then 

R = Q T A. 

18. a. If W = Span {xi, x 2 , x 3 } with {xi,x 2 ,x 3 } linearly inde¬ 

pendent, and if {vi, v 2 , v 3 } is an orthogonal set in IT, then 
{y i, v 2 , v 3 } is a basis for W. 

b. If x is not in a subspace W, then x — proj^ x is not zero. 

c. In a QR factorization, say A = QR (when A has lin¬ 
early independent columns), the columns of Q form an 
orthonormal basis for the column space of A. 


19. Suppose A = QR, where Q is m x n and R is n x n. Show 
that if the columns of A are linearly independent, then R must 
be invertible. [Hint: Study the equation Rx = 0 and use the 
fact that A = QR .] 

20. Suppose A = QR, where R is an invertible matrix. Show 
that A and Q have the same column space. [Hint: Given y in 
Col A, show that y = Qx for some x. Also, given y in Col Q, 
show that y = Ax for some x.] 


21. Given A = QR as in Theorem 12, describe how to find an 
orthogonal m x m (square) matrix Q \ and an invertible n xn 
upper triangular matrix R such that 




R 

0 


The MATLAB qr command supplies this “full” QR factor¬ 
ization when rank A = n. 


22. Let Ui,..., be an orthogonal basis for a subspace W of 
R", and let T : R" -> R” be defined by T(x) = proj^x. 
Show that T is a linear transformation. 


23. Suppose A = QR is a QR factorization of an m x n ma¬ 
trix A (with linearly independent columns). Partition A as 
[Ai A 2 ], where A\ has p columns. Show how to obtain a 
QR factorization of A\, and explain why your factorization 
has the appropriate properties. 


24. [M] Use the Gram-Schmidt process as in Example 2 to 
produce an orthogonal basis for the column space of 



-10 

13 

7 

-11 

2 

1 

-5 

3 

-6 

3 

13 

-3 

16 

-16 

-2 

5 

2 

1 

-5 

-7 


25. [M] Use the method in this section to produce a QR factor¬ 
ization of the matrix in Exercise 24. 


26. [M] For a matrix program, the Gram-Schmidt process works 
better with orthonormal vectors. Starting with Xi,... ,x p as 
in Theorem 11, let A = [x 3 ••• x p ]. Suppose Q is an 

n x k matrix whose columns form an orthonormal basis for 
the subspace W k spanned by the first k columns of A. Then 
for x in R' 7 , QQ T x is the orthogonal projection of x onto W k 
(Theorem 10 in Section 6.3). If x^+i is the next column of A, 
then equation (2) in the proof of Theorem 11 becomes 

Vt+i = X ft+1 - Q(Q T x/c+i) 

(The parentheses above reduce the number of arithmetic 
operations.) Let u^+i = v^+i/llw+i ||. The new Q for the 
next step is [ Q u^+i ]. Use this procedure to compute the 
QR factorization of the matrix in Exercise 24. Write the 
keystrokes or commands you use. 

WEB 


























362 


CHAPTER 6 


Orthogonality and Least Squares 


SOLUTION TO PRACTICE PROBLEMS 


1. Let Vi = xi 


2 


1 

1 

1 


and v 2 = x 2 


x 2 'Vi 

Vl'Vi 


Vi = x 2 - Ovi = x 2 . So {xi,x 2 } is al¬ 


ready orthogonal. All that is needed is to normalize the vectors. Let 


1 


1 


Ul 


Vl 


Vl 


V3 


" 1 " 


"1/V3" 

1 

— 

1/V3 

1 


_1/V3_ 


Instead of normalizing v 2 directly, normalize V 2 = 3v 2 instead: 


1 


u 2 


J 


1 


./ 


yi 2 + l 2 + (-2)2 


r 


1/V6" 

i 

— 

l/y/6 

-2 


_-2/V6_ 


Then {ui, u 2 } is an orthonormal basis for W. 

Since the columns of A are linearly dependent, there is a nontrivial vector x such 
that Ax = 0. But then QRx = 0. Applying Theorem 7 from Section 6.2 results in 


Rx 


QRx II 


o 


0. But \\Rx\\ = 0 implies Rx = 0, by Theorem 1 from 


Section 6.1. Thus there is a nontrivial vector x such that Rx = 0 and hence, by the 
Invertible Matrix Theorem, R cannot be invertible. 


6.5 LEAST-SQUARES PROBLEMS 


The chapter’s introductory example described a massive problem Ax = b that had no 
solution. Inconsistent systems arise often in applications, though usually not with such 
an enormous coefficient matrix. When a solution is demanded and none exists, the best 
one can do is to find an x that makes Ax as close as possible to b. 

Think of Ax as an approximation to b. The smaller the distance between b and Ax, 
given by ||b — Ax||, the better the approximation. The general least-squares problem 
is to find an x that makes ||b — Ax|| as small as possible. The adjective “least-squares” 
arises from the fact that ||b — Ax|| is the square root of a sum of squares. 


DEFINITION 


If A is m x n and b is in R m , a least-squares solution of Ax 
such that 

b — Ax|| < || b — Ax|| 


b is an x in M /? 


for all x in W 1 . 


The most important aspect of the least-squares problem is that no matter what x we 
select, the vector Ax will necessarily be in the column space, Col A. So we seek an x 
that makes Ax the closest point in Col A to b. See Figure 1. (Of course, if b happens to 
be in Col A, then b is Ax for some x, and such an x is a “least-squares solution.”) 

Solution of the General Least-Squares Problem 

Given A and b as above, apply the Best Approximation Theorem in Section 6.3 to the 
subspace Col A. Let 

b = projcoi a b 
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FIGURE 1 The vector b is closer to Ax 
than to Ax for other x. 


Because b is in the column space of A, the equation Ax = b is consistent, and there is 
an x in W 1 such that 


v4x = b 


0 ) 


Since b is the closest point in Col A to b, a vector x is a least-squares solution of Ax = b 


if and only if x satisfies (1). Such an x in W 1 is a list of weights that will build b out of 
the columns of A. See Figure 2. [There are many solutions of (1) if the equation has free 
variables.] 



Suppose x satisfies Ax = b. By the Orthogonal Decomposition Theorem in 

A A 

Section 6.3, the projection b has the property that b — b is orthogonal to Col A, 
so b — Ax is orthogonal to each column of A. If a j is any column of A, then 
a j • (b — Ax) = 0, and aj (b — Ax) = 0. Since each aj is a row of A T , 

A 7 '(b-^x) = 0 (2) 

(This equation also follows from Theorem 3 in Section 6.1.) Thus 

A T b-A T Ax = 0 

A T Ax = A T b 

These calculations show that each least-squares solution of Ax = b satisfies the equation 

A T Ax = A T b (3) 

The matrix equation (3) represents a system of equations called the normal equations 
for Ax = b. A solution of (3) is often denoted by x. 


The set of least-squares solutions of Ax = b coincides with the nonempty set of 
solutions of the normal equations A 7 Ax = A T b. 


THEOREM 13 
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PROOF As shown above, the set of least-squares solutions is nonempty and each 
least-squares solution x satisfies the normal equations. Conversely, suppose x satisfies 
A r Ax = A T b. Then x satisfies (2) above, which shows that b — Ax is orthogonal to the 
rows of A T and hence is orthogonal to the columns of A. Since the columns of A span 
Col A, the vector b — Ax is orthogonal to all of Col A. Hence the equation 

b = Ax + (b — Ax) 

is a decomposition of b into the sum of a vector in Col A and a vector orthogonal to 
Col A. By the uniqueness of the orthogonal decomposition, Ax must be the orthogonal 
projection of b onto Col A. That is, Ax = b, and x is a least-squares solution. ■ 


EXAM PLE 1 Find a least-squares solution of the inconsistent system Ax = b for 



"4 

0 


2 " 

A = 

0 

2 

, b = 

0 


1 

1 


11 


SOLUTION To use normal equations (3), compute: 




Then the equation A T Ax = 


A T b becomes 


"17 

r 


X\ 


"19" 

1 

5 


_x 2 _ 


11 


Row operations can be used to solve this system, but since A T A is invertible and 2x2, 
it is probably faster to compute 



-1 

17 


and then to solve A r Ax = A r b as 


x = (A T A)~ l A T b 


1 

5 

- 1 " 


"19" 

1 

84" 


T 

84 

-1 

17 


11 

~~ 84 

168 


2 


In many calculations, A r A is invertible, but this is not always the case. The next 
example involves a matrix of the sort that appears in what are called analysis of variance 
problems in statistics. 


EXAM PLE 2 Find a least-squares solution of Ax = b for 



—i 

1 

0 

1 

o 


~-3" 

1 

1 

0 

0 


-1 

1 

0 

1 

0 

, b = 

0 

1 

0 

1 

0 

2 

1 

0 

0 

1 


5 

i — 

0 

0 

1 _ 


— i 

_i 
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SOLUTION Compute 



The augmented matrix for A T Ax = 



^0 

1_ 

2 

2 

2 

4" 


"1 

0 

0 

1 

1 

CO 

2 

2 

0 

0 

-4 


0 

1 

0 

-1 

-5 

2 

0 

2 

0 

2 


0 

0 

1 

-1 

-2 

<N 

_1 

0 

0 

2 

o\ 

1 _ 


0 

0 

0 

0 

0 


The general solution is x\ = 3 — X4, X2 = — 5 + X4, X3 = — 2 + X4, and X 4 is free. So 
the general least-squares solution of v4x = b has the form 



3 


"-I" 

✓V 

-5 

+ X 4 

1 

x = 

-2 

1 


0 


1 



The next theorem gives useful criteria for determining when there is only one least- 

/V 

squares solution of v4x = b. (Of course, the orthogonal projection b is always unique.) 


I 


Let A be an m x n matrix. The following statements are logically equivalent: 


a. The equation v4x = b has a unique least-squares solution for each b in R 

b. The columns of A are linearly independent. 

c. The matrix A T A is invertible. 


When these statements are true, the least-squares solution x is given by 


X = (A T A)- l A T b 


(4) 


The main elements of a proof of Theorem 14 are outlined in Exercises 19-21, which 
also review concepts from Chapter 4. Formula (4) for x is useful mainly for theoretical 
purposes and for hand calculations when A T A is a 2 x 2 invertible matrix. 

When a least-squares solution x is used to produce v4x as an approximation to b, 
the distance from b to v4x is called the least-squares error of this approximation. 


EXAMPLE 3 Given A and b as in Example 1, determine the least-squares error in 
the least-squares solution of v4x = b. 
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FIGURE 3 


SOLUTION From Example 1, 



2" 


"4 

0 


T 

2 


"4" 

b = 

0 

and v4x = 

0 

2 


— 

4 


11 


1 

1 



3 


Hence 


b — v4x = 

2 

0 


"4" 

4 


-2 

-4 


i 

_i 


3 


—i 

oo 

1 


||b - Ax\\ = \J (—2) 2 + (-4) 2 + 8 2 = V^4 

The least-squares error is \/84. For any x in M 2 , the distance between b and the vector 
v4x is at least \/84. See Figure 3. Note that the least-squares solution x itself does not 
appear in the figure. ■ 


Alternative Calculations of Least-Squares Solutions 

The next example shows how to find a least-squares solution of v4x = b when the 
columns of A are orthogonal. Such matrices often appear in linear regression problems, 
discussed in the next section. 


EXAM PLE 4 Find a least-squares solution of v4x = b for 



SOLUTION Because the columns ai and a 2 of A are orthogonal, the orthogonal 
projection of b onto Col A is given by 


- b-ai b-a2 8 45 

b = - ai H- a 2 = -ai + —-a 2 

ai • ai a2 • a2 4 90 



"2" 


-3 


-1 

2 

+ 

-1 


1 

2 

1/2 


5/2 

2 


7/2 


11 /2 


Now that b is known, we can solve v4x = b. But this is trivial, since we already 

/V 

know what weights to place on the columns of A to produce b. It is clear from (5) that 


x 



8/4 


2 

45/90 


.1/2. 



In some cases, the normal equations for a least-squares problem can be ill- 
conditioned ; that is, small errors in the calculations of the entries of A T A can sometimes 
cause relatively large errors in the solution x. If the columns of A are linearly inde¬ 
pendent, the least-squares solution can often be computed more reliably through a QR 
factorization of A (described in Section 6.4). 1 


1 The QR method is compared with the standard normal equation method in G. Golub and C. Van Loan, 
Matrix Computations, 3rd ed. (Baltimore: Johns Hopkins Press, 1996), pp. 230-231. 
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THEOREM 15 


Given an m x n matrix A with linearly independent columns, let A = QR be a 
QR factorization of A as in Theorem 12. Then, for each b in M m , the equation 
v4x = b has a unique least-squares solution, given by 

x = R~ l Q T b (6) 


PROOF Let x = R~ l O T b. Then 




QRR~ l Q T b 



By Theorem 12, the columns of Q form an orthonormal basis for Colv4. Hence, by 

'T' A A A 

Theorem 10, QQ b is the orthogonal projection b of b onto Col A . Then v4x = b , which 
shows that x is a least-squares solution of Ax = b. The uniqueness of x follows from 
Theorem 14. ■ 


NUMERICAL NOTE - 

Since R in Theorem 15 is upper triangular, x should be calculated as the exact 
solution of the equation 

Rx = Q t b (7) 

It is much faster to solve (7) by back-substitution or row operations than to 
compute R~ l and use (6). 


EXAMPLE 5 


Find the least-squares solution of v4x = b for 




SOLUTION The QR factorization of A can be obtained as in Section 6.4. 


Then 





1/2 

1/2 

1/2 

QR = 

1/2 ■ 

- 1/2 ■ 

- 1/2 

1/2 ■ 

- 1/2 

1/2 


_ 1/2 

1/2 ■ 

- 1/2 

" 1/2 

1/2 

1/2 

1/2 

1/2 

- 1/2 

- 1/2 

1/2 

1/2 

- 1/2 

1/2 

- 1/2 




5 

3 

2 



The least-squares solution x satisfies Rx = Q T b; that is, 


2 4 5" 


X\ 


6 

0 2 3 


x 2 

— 

-6 

0 0 2 


_*3 _ 


4 


This equation is solved easily and yields x 


10 

-6 

2 
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PRACTICE PROBLEMS 



"1 

-3 

-3" 


5" 

1. Let A = 

1 

5 

1 

and b = 

-3 


1 

7 

2 


-5 


. Find a least-squares solution of Ax 


and compute the associated least-squares error. 



2. What can you say about the least-squares solution of Ax = b when b is orthogonal 
to the columns of A1 


6.5 EXERCISES 


In Exercises 1-4, find a least-squares solution of Ax = b by 
(a) constructing the normal equations for x and (b) solving for x. 



In Exercises 5 and 6, describe all least-squares solutions of the 
equation Ax = b. 


5. A = 


6 . A = 



' 1 

1 

0 " 


" 1 " 


1 

1 

0 

,b = 

3 


1 

0 

1 

8 


_ 1 

0 

1 _ 


_ 2 _ 


" 1 

1 

0 " 


"7" 


1 

1 

0 


2 


1 

1 

0 

,b = 

3 


1 

0 

1 

6 


1 

0 

1 


5 


1 

0 

1 


4 


7. Compute the least-squares error associated with the least- 
squares solution found in Exercise 3. 


8. Compute the least-squares error associated with the least- 
squares solution found in Exercise 4. 


In Exercises 9-12, find (a) the orthogonal projection of b onto 
Col A and (b) a least-squares solution of Ax = b. 



1 

5" 


4" 

9. A = 

3 

-2 

1 

4 

,b = 

-2 

-3 


10 . A = 


11 . A = 


12 . A = 


1 

2" 


3" 

-1 

4 

,b = 

-1 

1 

2 


5 


"4 

0 

1 " 


"9“ 

1 

-5 

1 

,b = 

0 

6 

1 

0 

0 

1 

-1 

-5 


0 


1 

1 

0“ 


"2" 

1 

0 

-1 

,b = 

5 

0 

1 

1 

6 

-1 

1 

-1 


6 




3 

4" 


"11" 


5" 

-1 

Let A = 

-2 

3 

1 

4 

,b = 

-9 

5 

, u = 


and v = 



. Compute Au and Ay, and compare them with b. 


Could u possibly be a least-squares solution of Ax = b? 
(Answer this without computing a least-squares solution.) 




2 

1 


5 


A 


Let A = 

-3 

3 

K> -L 

1_ 

, b = 

i 

'xj- 'xf 

_1 

, U = 

_ —5 _ 

, and v 



. Compute Au and Ay, and compare them with b. Is 


it possible that at least one of u or v could be a least-squares 
solution of Ax = b? (Answer this without computing a least- 
squares solution.) 


In Exercises 15 and 16, use the factorization A = QR to find the 
least-squares solution of Ax = b. 


15. A = 


16. A = 


"2 

3" 


2/3 

-1/3" 

2 

4 

— 

2/3 

2/3 

_ 1 

1_ 


_ 1/3 

-2/3 _ 

“ 1 

-1" 


' 1/2 

-1/2" 

1 

4 


1/2 

1/2 

1 

-1 


1/2 

-1/2 

1 

4 


. !/ 2 

1/2 


3 

0 


2 

0 



In Exercises 17 and 18, A is an m xn matrix and b is in R m . Mark 


each statement True or False. Justify each answer. 


17. a. The general least-squares problem is to find an x that 

makes Ax as close as possible to b. 
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b. A least-squares solution of Ax = b is a vector x that 

/V /V 

satisfies Ax = b, where b is the orthogonal projection of 
b onto Col A . 

c. A least-squares solution of Ax = b is a vector x such that 
||b — Ax|| < ||b — Ax|| for all x in R". 

d. Any solution of A T Ax = A T b is a least-squares solution 
of Ax = b. 

e. If the columns of A are linearly independent, then the 
equation Ax = b has exactly one least-squares solution. 

18. a. If b is in the column space of A, then every solution of 

Ax = b is a least-squares solution. 

b. The least-squares solution of Ax = b is the point in the 
column space of A closest to b. 

c. A least-squares solution of Ax = b is a list of weights 
that, when applied to the columns of A , produces the 
orthogonal projection of b onto Col A. 

d. If x is a least-squares solution of Ax = b, then 
x = (A r A)~ l A r b. 

e. The normal equations always provide a reliable method 
for computing least-squares solutions. 

f. If A has a QR factorization, say A = QR , then the best 
way to find the least-squares solution of Ax = b is to 
compute x = R~ l Q T b. 

19. Let A be an m x n matrix. Use the steps below to show that a 
vector x in R n satisfies Ax = 0 if and only if A T Ax = 0. This 
will show that Nul A = Nul A T A. 

a. Show that if Ax = 0, then A T Ax = 0. 

b. Suppose A T Ax = 0. Explain why x T A T Ax = 0, and use 
this to show that Ax = 0. 

20. Let A be an m x n matrix such that A T A is invertible. Show 
that the columns of A are linearly independent. [Careful: 
You may not assume that A is invertible; it may not even be 
square.] 

21. Let A be an m x n matrix whose columns are linearly inde¬ 
pendent. [Careful: A need not be square.] 

a. Use Exercise 19 to show that A T A is an invertible matrix. 

b. Explain why A must have at least as many rows as 
columns. 

c. Determine the rank of A . 

22. Use Exercise 19 to show that rank A T A = rank A . [Hint: How 
many columns does A T A have? How is this connected with 
the rank of A/4?] 

23. Suppose A is m x n with linearly independent columns and 
b is in R m . Use the normal equations to produce a formula 

/V 

for b, the projection of b onto Col A. [Hint: Find x first. The 
formula does not require an orthogonal basis for Col A.] 


24. Find a formula for the least-squares solution of Ax = b when 
the columns of A are orthonormal. 

25. Describe all least-squares solutions of the system 


x + y = 2 
x + y = 4 

26. [M] Example 3 in Section 4.8 displayed a low-pass linear 
filter that changed a signal { yk} into {yk+ 1 } and changed a 
higher-frequency signal {wk} into the zero signal, where 
yk = cos(7rk/4) and Wk = cos(37rk/4). The following cal¬ 
culations will design a filter with approximately those prop¬ 
erties. The filter equation is 

a 0 yk+ 2 +aiy k+ i + a 2 y k = Zk for all k (8) 

Because the signals are periodic, with period 8, it suffices 
to study equation (8) for k = 0,..., 7. The action on the 
two signals described above translates into two sets of eight 
equations, shown below: 
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Write an equation Ax = b, where A is a 16 x 3 matrix formed 
from the two coefficient matrices above and where b inR 16 is 
formed from the two right sides of the equations. Find a 0 ,ai, 
and a 2 given by the least-squares solution of Ax = b. (The 
.7 in the data above was used as an approximation for \/2/2, 
to illustrate how a typical computation in an applied problem 
might proceed. If .707 were used instead, the resulting filter 
coefficients would agree to at least seven decimal places 
with a/2/4, 1/2, and a/2/4, the values produced by exact 
arithmetic calculations.) 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. First, compute 
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Next, row reduce the augmented matrix for the normal equations, A T Ax = 
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The general least-squares solution is X\ = 2 + x 2 = — 1 — ^ 3 , with X 3 free. 

For one specific solution, take X 3 = 0 (for example), and get 


2 

x = -1 

0 

To find the least-squares error, compute 



1_ 

-3 

_1 


1 

<N 

1 _ 


1 

_1 

b = Ax = 

1 

5 

1 


-1 

— 

-3 


_1 

7 

2 


1 

O 

_1 


1 

Ui 

1 _ 


It turns out that b = 
happens to be in Col 


b, so lb 




= 0. The least-squares error is zero because b 


2 . If b is orthogonal to the columns of A , then the projection of b onto the column space 
of A is 0 . In this case, a least-squares solution x of Ax = b satisfies Ax = 0 . 


6.6 APPLICATIONS TO LINEAR MODELS 


A common task in science and engineering is to analyze and understand relationships 
among several quantities that vary. This section describes a variety of situations in which 
data are used to build or verify a formula that predicts the value of one variable as a 
function of other variables. In each case, the problem will amount to solving a least- 
squares problem. 

For easy application of the discussion to real problems that you may encounter later 
in your career, we choose notation that is commonly used in the statistical analysis of 
scientific and engineering data. Instead of Ax = b, we write X ft = y and refer to X as 

the design matrix, as the parameter vector, and y as the observation vector. 

Least-Squares Lines 

The simplest relation between two variables x and y is the linear equation 
y = Po + p\x} Experimental data often produce points (jxq, jq),..., (x n , y n ) that, 


1 


This notation is commonly used for least-squares lines instead of y = mx + 
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when graphed, seem to lie close to a line. We want to determine the parameters Po 
and Pi that make the line as “close” to the points as possible. 

Suppose Po and p\ are fixed, and consider the line y = Po + P\X in Figure 1. 
Corresponding to each data point (xj , yj) there is a point (xj , Po + P\Xj) on the line 

i Xj the 

predicted y -value (determined by the line). The difference between an observed y -value 
and a predicted y -value is called a residual. 


with the same x-coordinate. We call yj the observed value of y and Po + P 



FIGURE 1 Fitting a line to experimental data. 


There are several ways to measure how “close” the line is to the data. The usual 
choice (primarily because the mathematical calculations are simple) is to add the squares 
of the residuals. The least-squares line is the line y = Po + p\x that minimizes the 
sum of the squares of the residuals. This line is also called a line of regression of y 
on x, because any errors in the data are assumed to be only in the y -coordinates. The 
coefficients Po, Pi of the line are called (linear) regression coefficients. 2 

If the data points were on the line, the parameters Po and Pi would satisfy the 
equations 


Predicted 
y -value 

Observed 
y -value 

Po + P\X\ 

= ft 

Po + P\X2 

• 

= yi 

• 

• 

• 

Po + P\X n 

• 

• 

= y n 


We can write this system as 


*fi = y. 


where X = 
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_y n _ 



Of course, if the data points don’t lie on a line, then there are no parameters P o, pi for 
which the predicted y -values in X P equal the observed y-values in y, and X P = y has 
no solution. This is a least-squares problem, Ax = b, with different notation! 

The square of the distance between the vectors XP and y is precisely the sum of 
the squares of the residuals. The P that minimizes this sum also minimizes the distance 
between X P and y. Computing the least-squares solution of X P = y is equivalent to 
finding the P that determines the least-squares line in Figure 1 . 


2 If the measurement errors are in x instead of y , simply interchange the coordinates of the data (xj , yj ) 
before plotting the points and computing the regression line. If both coordinates are subject to possible error, 
then you might choose the line that minimizes the sum of the squares of the orthogonal (perpendicular) 
distances from the points to the line. See the Practice Problems for Section 7.5. 
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EXAM PLE 1 Find the equation y = /?o + /3\x of the least-squares line that best fits 
the data points (2, 1), (5, 2), (7, 3), and (8, 3). 

SOLUTION Use the x-coordinates of the data to build the design matrix X in (1) and 
the y -coordinates to build the observation vector y: 
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For the least-squares solution of Xfi = y, obtain the normal equations (with the new 
notation): 


X t XB = X‘\ 


71 


That is, compute 
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The normal equations are 
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Thus the least-squares line has the equation 


y 


2 5 

— + —x 
7 14 


See Figure 2. 





x 


FIGURE 2 The least-squares line 

y = i + Tix. 


A common practice before computing a least-squares line is to compute the average 
x of the original x-values and form a new variable x* = x — x. The new x-data are said 
to be in mean-deviation form. In this case, the two columns of the design matrix will 
be orthogonal. Solution of the normal equations is simplified, just as in Example 4 in 
Section 6.5. See Exercises 17 and 18. 
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The General Linear Model 

In some applications, it is necessary to fit data points with something other than a straight 
line. In the examples that follow, the matrix equation is still X P = y, but the specific 
form of X changes from one problem to the next. Statisticians usually introduce a 
residual vector e , defined by e = y — X P , and write 

y = Xp +e 

Any equation of this form is referred to as a linear model. Once X and y are determined, 
the goal is to minimize the length of €, which amounts to finding a least-squares solution 
of Xp = y. In each case, the least-squares solution P is a solution of the normal 
equations 

X T XP = X T y 


Least-Squares Fitting of Other Curves 


y 



FIGURE 3 

Average cost curve 


y 


x 



X 


When data points (xi, y \),..., (x n ,y n ) on a scatter plot do not lie close to any line, it 
may be appropriate to postulate some other functional relationship between x and y. 
The next two examples show how to fit data by curves that have the general form 


y = PoMx) + Bifi(x) + -b Pkfk(x) 


( 2 ) 


where /o, ..., fk are known functions and Po, • • •, Pk are parameters that must be 
determined. As we will see, equation (2) describes a linear model because it is linear in 
the unknown parameters. 

For a particular value of x, (2) gives a predicted, or “fitted,” value of y. The 
difference between the observed value and the predicted value is the residual. The 
parameters po,.. ., Pk must be determined so as to minimize the sum of the squares 
of the residuals. 


EXAMPLE 2 Suppose data points (xi, y\ ),..., (x n , y n ) appear to lie along some 
sort of parabola instead of a straight line. For instance, if the x-coordinate denotes the 
production level for a company, and y denotes the average cost per unit of operating at 
a level of x units per day, then a typical average cost curve looks like a parabola that 
opens upward (Figure 3). In ecology, a parabolic curve that opens downward is used 
to model the net primary production of nutrients in a plant, as a function of the surface 
area of the foliage (Figure 4). Suppose we wish to approximate the data by an equation 
of the form 

— Po + P\X + P2X 2 


y 


(3) 


Describe the linear model that produces a “least-squares fit” of the data by equation (3). 

SOLUTION Equation (3) describes the ideal relationship. Suppose the actual values of 
the parameters are Po, p\ , ^ 2 - Then the coordinates of the first data point ( X\ , y\) satisfy 
an equation of the form 

yi = Po + P\X\ + p 2 x\ + 61 

where €\ is the residual error between the observed value y 1 and the predicted y -value 
Po + p\X\ + P2x\. Each data point determines a similar equation: 


yi 

yi 


po + P\X\ + p2%\ + £1 

Po + P\%2 + Pl x 2 + ^2 


FIGURE 4 

Production of nutrients. 


yn — Po + P 1 x n + PlXn + € n 
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It is a simple matter to write this system of equations in the form y = 
X, inspect the first few rows of the system and look for the pattern. 
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FIGURE 5 


EXAMPLE 3 If data points tend to follow a pattern such as in Figure 5, then an 
appropriate model might be an equation of the form 

y = Po + P\ x + Pi* 1 + /?3X 3 

Such data, for instance, could come from a company’s total costs, as a function of the 
level of production. Describe the linear model that gives a least-squares fit of this type 
to data (x u yi),...,(x n ,y n ). 

SOLUTION By an analysis similar to that in Example 2, we obtain 

Observation Design Parameter Residual 

vector matrix vector vector 


Data points along a cubic curve. 
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Multiple Regression 

Suppose an experiment involves two independent variables —say, u and v — and one 
dependent variable, y . A simple equation for predicting y from u and v has the form 

y = Po + Piu + fi 2 v (4) 

A more general prediction equation might have the form 

y — P 0 + P\U + p 2 v + p 3 u 2 + p^uv + p 5 v 2 (5) 

This equation is used in geology, for instance, to model erosion surfaces, glacial cirques, 
soil pH, and other quantities. In such cases, the least-squares fit is called a trend surface. 

Equations (4) and (5) both lead to a linear model because they are linear in the 
unknown parameters (even though u and v are multiplied). In general, a linear model 
will arise whenever y is to be predicted by an equation of the form 

y = Pofo(u,v ) + Pifi(u,v) H-b Pkfk(u,v) 

with fo, ■ ■ ■, fk any sort of known functions and /3 q , ,fik unknown weights. 


EXAMPLE 4 In geography, local models of terrain are constructed from data 
(wi, V\, ji), ..., ( u n , v n , y n ), where Uj , Vj , and yj are latitude, longitude, and altitude, 
respectively. Describe the linear model based on (4) that gives a least-squares fit to such 
data. The solution is called the least-squares plane. See Figure 6. 
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FIGURE 6 A least-squares plane. 

SOLUTION We expect the data to satisfy the following equations: 

yi — Po + P\U\ + p2V\ + €\ 

yi — Po + P\U2 + P2V2 + £2 

• • 

• • 

• • 

L /7 = Po + P\ u n + p 2 V n + 

This system has the matrix form y = X ft + e , where 
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vector matrix vector vector 
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The Geometry of a 
Linear Model 6-19 


Example 4 shows that the linear model for multiple regression has the same abstract 
form as the model for the simple regression in the earlier examples. Linear algebra gives 
us the power to understand the general principle behind all the linear models. Once X 
is defined properly, the normal equations for P have the same matrix form, no matter 
how many variables are involved. Thus, for any linear model where X T X is invertible, 
the least-squares P is given by (X T X)~ l X T y. 

Further Reading 

Ferguson, J., Introduction to Linear Algebra in Geology (New York: Chapman & Hall, 
1994). 

Krumbein, W. C., and F. A. Graybill, An Introduction to Statistical Models in Geology 
(New York: McGraw-Hill, 1965). 

Legendre, P., and L. Legendre, Numerical Ecology (Amsterdam: Elsevier, 1998). 

Unwin, David L, An Introduction to Trend Surface Analysis , Concepts and Techniques 
in Modern Geography, No. 5 (Norwich, England: Geo Books, 1975). 


PRACTICE PROBLEM 

When the monthly sales of a product are subject to seasonal fluctuations, a curve that 
approximates the sales data might have the form 

y = /3o + P\x + P 2 sin (2 ttx/12 ) 

where x is the time in months. The term Po + P\x gives the basic sales trend, and the 
sine term reflects the seasonal changes in sales. Give the design matrix and the parameter 
vector for the linear model that leads to a least-squares fit of the equation above. Assume 
the data are (xi, y 1 ),..., (x n , y n ). 
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6.6 EXERCISES 

In Exercises 1-4, find the equation y = p 0 + p { x of the least- 
squares line that best fits the given data points. 

1. (0,1), (1,1), (2, 2), (3, 2) 

2. (1,0), (2,1), (4, 2), (5, 3) 

3. (-1,0), (0,1), (1,2), (2,4) 

4. (2, 3), (3,2), (5,1), (6,0) 

5. Let X be the design matrix used to find the least-squares line 
to fit data (xi, yi),..., (x n , y n ). Use a theorem in Section 6.5 
to show that the normal equations have a unique solution 
if and only if the data include at least two data points with 
different x-coordinates. 

6. Let X be the design matrix in Example 2 corresponding to 
a least-squares fit of a parabola to data (xi, yi),..., (x n , y n ). 
Suppose Xi, x 2 , and x 3 are distinct. Explain why there is only 
one parabola that fits the data best, in a least-squares sense. 
(See Exercise 5.) 

7. A certain experiment produces the data (1,1.8), (2,2.7), 
(3, 3.4), (4, 3.8), (5, 3.9). Describe the model that produces 
a least-squares fit of these points by a function of the form 

y = P\x + p 2 x 2 

Such a function might arise, for example, as the revenue from 
the sale of x units of a product, when the amount offered for 
sale affects the price to be set for the product. 

a. Give the design matrix, the observation vector, and the 
unknown parameter vector. 

b. [M] Find the associated least-squares curve for the data. 

8. A simple curve that often makes a good model for the vari¬ 
able costs of a company, as a function of the sales level x, 
has the form y = p x x + p 2 x 2 + /3 3 x 3 . There is no constant 
term because fixed costs are not included. 

a. Give the design matrix and the parameter vector for the 
linear model that leads to a least-squares fit of the equa¬ 
tion above, with data (xi, yi),..., (x n , y n ). 

b. [M] Find the least-squares curve of the form above to fit 
the data (4,1.58), (6,2.08), (8,2.5), (10,2.8), (12, 3.1), 
(14, 3.4), (16, 3.8), and (18, 4.32), with values in thou¬ 
sands. If possible, produce a graph that shows the data 
points and the graph of the cubic approximation. 

9. A certain experiment produces the data (1, 7.9), (2, 5.4), and 
(3, —.9). Describe the model that produces a least-squares fit 
of these points by a function of the form 

y = A cos x + B sin x 

10. Suppose radioactive substances A and B have decay con¬ 
stants of .02 and .07, respectively. If a mixture of these two 
substances at time t = 0 contains M A grams of A and M B 
grams of B , then a model for the total amount y of the mixture 
present at time t is 

y = M A e~ mt + M B e~- 01t (6) 


Suppose the initial amounts M A and M B are unknown, 
but a scientist is able to measure the total amounts 
present at several times and records the following points 
(6,y/): (10,21.34), (11,20.68), (12,20.05), (14,18.87), 
and (15,18.30). 

a. Describe a linear model that can be used to estimate M A 
and M b . 


b. [M] Find the least-squares curve based on (6). 



Halley’s Comet last appeared in 1986 and will reappear in 
2061. 


11. [M] According to Kepler’s first law, a comet should have 
an elliptic, parabolic, or hyperbolic orbit (with gravitational 
attractions from the planets ignored). In suitable polar coor¬ 
dinates, the position (r, ft) of a comet satisfies an equation of 
the form 

r = /3 + e(r • cos #) 

where p is a constant and e is the eccentricity of the orbit, 
withO < e < 1 for an ellipse, e = 1 for a parabola, and e > 1 
for a hyperbola. Suppose observations of a newly discovered 
comet provide the data below. Determine the type of orbit, 
and predict where the comet will be when ft = 4.6 (radians) . 3 


ft 

.88 

1.10 

1.42 

1.77 

2.14 

r 

3.00 

2.30 

1.65 

1.25 

1.01 


12. [M] A healthy child’s systolic blood pressure p (in millime¬ 
ters of mercury) and weight w (in pounds) are approximately 
related by the equation 

Po + Pi Inw = p 

Use the following experimental data to estimate the systolic 
blood pressure of a healthy child weighing 100 pounds. 


3 The basic idea of least-squares fitting of data is due to K. F. Gauss 
(and, independently, to A. Legendre), whose initial rise to fame occurred 
in 1801 when he used the method to determine the path of the asteroid 
Ceres. Forty days after the asteroid was discovered, it disappeared behind 
the sun. Gauss predicted it would appear ten months later and gave its 
location. The accuracy of the prediction astonished the European scientific 
community. 
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w 

44 

61 

81 

113 

131 

In w 

3.78 

4.11 

4.39 

4.73 

4.88 

P 

91 

98 

103 

110 

112 


13. [M] To measure the takeoff performance of an airplane, the 
horizontal position of the plane was measured every second, 
from f = 0to£ = 12. The positions (in feet) were: 0, 8.8, 
29.9, 62.0, 104.7, 159.1, 222.0, 294.5, 380.4, 471.1, 571.7, 
686.8, and 809.2. 

a. Find the least-squares cubic curve y — /3 0 + Pit + 
P 2 1 2 + p 2 t 3 for these data. 

b. Use the result of part (a) to estimate the velocity of the 
plane when t — 4.5 seconds. 

1 1 

14. Letx = — (xi + • • • + x n ) and y — — (y\ + • • • + y n ). Show 

n n 

that the least-squares line for the data (xi, jq),..., (x n , y n ) 
must pass through (x, y). That is, show that x and y satisfy 

A A 

the linear equation y — /3 0 + fi{x. [Hint: Derive this equa- 

/V 

tion from the vector equation y = X ft + €. Denote the first 
column of A by 1. Use the fact that the residual vector e is 
orthogonal to the column space of X and hence is orthogonal 
to 1.] 


17. a. Rewrite the data in Example 1 with new x-coordinates 

in mean deviation form. Let X be the associated design 
matrix. Why are the columns of X orthogonal? 

b. Write the normal equations for the data in part (a), and 
solve them to find the least-squares line, y = p 0 + Pix*, 
where x* = x — 5.5. 

18. Suppose the x-coordinates of the data (xi, y i),..., (x n , y n ) 
are in mean deviation form, so that J^x, =0. Show that if 
X is the design matrix for the least-squares line in this case, 
then X T X is a diagonal matrix. 

Exercises 19 and 20 involve a design matrix X with two or more 

yy 

columns and a least-squares solution ft of y = xfi . Consider the 
following numbers. 

/v 

(i) ||A/?|| 2 —the sum of the squares of the “regression term.” 
Denote this number by SS(R). 

yy 

(ii) ||y — Xp ||" — the sum of the squares for error term. Denote 
this number by SS(E). 

(iii) || y || 2 —the “total” sum of the squares of the y -values. Denote 
this number by SS(T). 


Given data for a least-squares problem, (xi, y\), ..., (x„, y n ), the 
following abbreviations are helpful: 

£* = £?=i*. £^ 2 = £Li if. 

y = z>t = z2i=i x iyi 

yy yy 

The normal equations for a least-squares line y = p 0 + fi\X may 
be written in the form 

*A> + 0iE* = Ey 
+ E* 2 = E*? 

15. Derive the normal equations (7) from the matrix form given 
in this section. 

16. Use a matrix inverse to solve the system of equations in (7) 

yy yy 

and thereby obtain formulas for /3 0 and ft \ that appear in many 
statistics texts. 


Every statistics text that discusses regression and the linear model 
y = Xp + e introduces these numbers, though terminology and 
notation vary somewhat. To simplify matters, assume that the 
mean of the y-values is zero. In this case, SS(T) is proportional to 
what is called the variance of the set of y -values. 

19. Justify the equation SS(T) = SS(R) + SS(E). [Hint: Use a 
theorem, and explain why the hypotheses of the theorem are 
satisfied.] This equation is extremely important in statistics, 
both in regression theory and in the analysis of variance. 

20. Show that || Xfi\\ 2 = fi T X T y. [Hint: Rewrite the left side 

yy 

and use the fact that ft satisfies the normal equations.] This 
formula for SS(R) is used in statistics. From this and from 
Exercise 19, obtain the standard formula for SS(E): 

SS(E) = y T y-p T X T y 


SOLUTION TO PRACTICE PROBLEM 


y 



Sales trend with seasonal 
fluctuations. 


Construct X and (3 so that the kth row of X($ is the predicted y -value that corresponds 
to the data point (x^, y^), namely, 

Po + fix Xk + fii sin(27rx fc /12) 

It should be clear that 



1 X] 

sin( 27 rxi/ 12 ) 


Po 

X = 

• • 

• • 

• • 

_ 1 X,i 

• 

• 

• 

sin( 27 rx, 7 / 12 ) 

, fi = 

Pi 

_ P 2 _ 
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6.7 INNER PRODUCT SPACES 


Notions of length, distance, and orthogonality are often important in applications 
involving a vector space. For M", these concepts were based on the properties of the 
inner product listed in Theorem 1 of Section 6.1. For other spaces, we need analogues of 
the inner product with the same properties. The conclusions of Theorem 1 now become 
axioms in the following definition. 


DEFINITION An inner product on a vector space V is a function that, to each pair of vectors 

u and v in V, associates a real number (u, v) and satisfies the following axioms, 
for all u, v, and w in V and all scalars c: 

1 . (u,v) = (v,u) 

2 . (u + v, w) = (u, w) + (v, w) 

3. (cu, v) = c(u, v) 

4. (u, u) > 0 and (u, u) = 0 if and only if u = 0 

A vector space with an inner product is called an inner product space. 


The vector space R /? with the standard inner product is an inner product space, and 
nearly everything discussed in this chapter for W 1 carries over to inner product spaces. 
The examples in this section and the next lay the foundation for a variety of applications 
treated in courses in engineering, physics, mathematics, and statistics. 

EXAMPLE 1 Fix any two positive numbers —say, 4 and 5 —and for vectors 
u = (u i, u 2 ) and v = (iq , V 2 ) in M 2 , set 

(u, v) = 4u\Vi + 5u 2 v 2 (1) 

Show that equation (1) defines an inner product. 

SOLUTION Certainly Axiom 1 is satisfied, because (u,v) = 4u\V\ + 5u 2 v 2 = 
Av\U\ + 5v 2 u 2 = (v, u). If w = (uq, W 2 ), then 

(u + v, w) = 4(u\ + iq)uq + 5(u 2 + v 2 )w 2 

= 4wiuq + 5u 2 w 2 + 4iqwq + 5v 2 w 2 
= (u, w) + (v, w) 

This verifies Axiom 2. For Axiom 3, compute 

(cu, v) = 4(cu\)v\ + 5(cu 2 )v 2 = c(4u\V\ + 5u 2 v 2 ) = c(u, v) 

For Axiom 4, note that (u,u) = 4 u\ + 5w 2 > 0,and4w 2 + 5w 2 = Oonlyif^i = u 2 — 
0, that is, if u = 0. Also, ( 0 , 0 ) = 0 . So (1) defines an inner product on M 2 . ■ 

Inner products similar to (1) can be defined on M 7? . They arise naturally in con¬ 
nection with “weighted least-squares” problems, in which weights are assigned to the 
various entries in the sum for the inner product in such a way that more importance is 
given to the more reliable measurements. 

From now on, when an inner product space involves polynomials or other functions, 
we will write the functions in the familiar way, rather than use the boldface type for 
vectors. Nevertheless, it is important to remember that each function is a vector when it 
is treated as an element of a vector space. 
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EXAMPLE 2 Let to,, t n be distinct real numbers. For p and q in P„, define 

(p,q) = p(to)q(t 0 ) + p(h)q(ti) H -1 - p(t„)q(t n ) (2) 

Inner product Axioms 1-3 are readily checked. For Axiom 4, note that 

{p, P) = [pih)f + [p(h)f H-1- [p(t n )f > 0 

Also, (0,0) = 0. (The boldface zero here denotes the zero polynomial, the zero vector 
in P„.) If (p, p) = 0, then p must vanish at n + 1 points: to,... ,t n . This is possible 
only if p is the zero polynomial, because the degree of p is less than n + 1. Thus (2) 
defines an inner product on P„. ■ 

EXAMPLE 3 Let V be P 2 , with the inner product from Example 2, where to = 0, 
t\ — \, and t 2 — 1. Let p(t) — 12 1 2 and q(t ) = 2t — \ . Compute ( p , q) and (q, q). 

SOLUTION 

(. P > <?) = P(0)q(0) + p(\)q ( 5 ) + p(0q( 1) 

= (OK—1) + (3)(0) + (12) (1) = 12 

(q,q) = [?(0)] 2 + [q (j)] 2 + [q( l)] 2 

= (-1) 2 + (0) 2 + (l) 2 =2 ■ 


Lengths, Distances, and Orthogonality 


Let V be an inner product space, with the inner product denoted by (u, v) . Just as in W 1 , 
we define the length, or norm, of a vector v to be the scalar 

= V (v, v) 

Equivalently, ||v|| 2 = (v, v). (This definition makes sense because <v,v> > 0 , but the 
definition does not say that (v, v) is a “sum of squares,” because v need not be an element 
of M 77 .) 

A unit vector is one whose length is 1. The distance between u and v is ||u - 
Vectors u and v are orthogonal if (u, v) = 0. 


EXAMPLE 4 Let P 2 have the inner product ( 2 ) of Example 3. Compute the lengths 
of the vectors p(t) = 12^ 2 and q(t ) = 2t — \ . 


SOLUTION 



= {p, p) = b( 0)] 2 + [p (j )] 2 + [pi})] 2 
= 0 + [3 ] 2 + [12 ] 2 = 153 

= 7153 


From Example 3, (q,q) = 2. Hence 




The Gram-Schmidt Process 

The existence of orthogonal bases for finite-dimensional subspaces of an inner product 
space can be established by the Gram-Schmidt process, just as in P ;? . Certain orthogonal 
bases that arise frequently in applications can be constructed by this process. 

The orthogonal projection of a vector onto a subspace W with an orthogonal basis 
can be constructed as usual. The projection does not depend on the choice of orthogonal 
basis, and it has the properties described in the Orthogonal Decomposition Theorem and 
the Best Approximation Theorem. 
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EXAMPLE 5 Let V be P 4 with the inner product in Example 2, involving evaluation 
of polynomials at — 2 , — 1 , 0, 1, and 2 , and view P 2 as a subspace of V. Produce an 
orthogonal basis for P 2 by applying the Gram-Schmidt process to the polynomials 1 ,t, 
and t 2 . 


SOLUTION The inner product depends only on the values of a polynomial at —2,..., 2, 
so we list the values of each polynomial as a vector in P 5 , underneath the name of the 
polynomial : 1 


Polynomial: 

1 

t 

t 2 


" 1 " 


"-2" 


"4" 


1 


-1 


1 

Vector of values: 

1 

9 

0 

9 

0 


1 


1 


1 


1 


2 


4 


The inner product of two polynomials in V equals the (standard) inner product of their 
corresponding vectors in P 5 . Observe that t is orthogonal to the constant function 1. So 
take po(t) = 1 and p\(t) = t. For p 2 , use the vectors in P 5 to compute the projection 
of t 2 onto Spanj^o, Pi}- 

(t 2 , po) = (t 2 , 1) =4 + 1 + 0+ 1 + 4= 10 
{po,Po) = 5 

(t 2 , p\) = (t 2 , t) = —8 + (— 1 ) + 0+1 + 8 = 0 


The orthogonal projection of t 2 onto Span {1, t} is y po + 0p \. Thus 

p 2 (t) = t 2 - 2p 0 (t) = t 2 -2 

An orthogonal basis for the subspace P 2 of V is: 


Polynomial: 


Vector of values: 


Po Pi P 2 


— 1 

_ 1 


1 

<N 

1 _ 


1 - 

<N 

1 _ 

1 


-1 


-1 

1 

9 

0 

9 

-2 

1 


1 


-1 

1 — 

1 _ 


1 

<N 

_ 1 


— 1 

<N 

_ 1 



Best Approximation in Inner Product Spaces 

A common problem in applied mathematics involves a vector space V whose elements 
are functions. The problem is to approximate a function / in E by a function g from a 
specified subspace W of V. The “closeness” of the approximation of / depends on the 
way ||/ — g\\ is defined. We will consider only the case in which the distance between 
/ and g is determined by an inner product. In this case, the best approximation to f by 
functions in W is the orthogonal projection of / onto the subspace W. 

EXAMPLE 6 Let V be P 4 with the inner product in Example 5, and let po, p \, 
and P 2 be the orthogonal basis found in Example 5 for the subspace P 2 . Find the best 
approximation to p(t) = 5 — \t A by polynomials in P 2 . 


1 Each polynomial in P 4 is uniquely determined by its value at the five numbers —2,..., 2. In fact, the 

correspondence between p and its vector of values is an isomorphism, that is, a one-to-one mapping onto 

R 5 that preserves linear combinations. 
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SOLUTION The values of po, p \, and P 2 at the numbers —2, —1,0, 1, and 2 are listed 
in R 5 vectors in (3) above. The corresponding values for p are —3, 9/2, 5, 9/2, and —3. 
Compute 


(p,p 0 ) = 8, 

(. Po,Po ) = 5, 


(p,pi) = 0, 


(P-Pi) = -31 
(P2, Pi) = 14 


Then the best approximation in V to p by polynomials in P 2 is 


P = P ro J P 9 P 


(P’PO ) . (P,P 1 } , (P y Pi) 

PO + - 7 Pi + - 7 Pi 


(Po, Po) 


(PuPl) 


(Ply Pi) 


8 „ , -31 „ 8 31 /.2 


5 P 0 + T TP 2 


14 


r-2). 


This polynomial is the closest to p of all polynomials in P 2 , when the distance between 
polynomials is measured only at —2, —1,0, 1, and 2. See Figure 1. ■ 


y 



The polynomials po, p \, and P 2 in Examples 5 and 6 belong to a class of polynomi¬ 
als that are referred to in statistics as orthogonal polynomials ? The orthogonality refers 
to the type of inner product described in Example 2. 



Two Inequalities 


Given a vector v in an inner product space V and given a finite-dimensional subspace 
W , we may apply the Pythagorean Theorem to the orthogonal decomposition of v with 
respect to W and obtain 


P r °jjv v ll~ + ||v — proj^, v 


FIGURE 2 

The hypotenuse is the longest side. 


See Figure 2. In particular, this shows that the norm of the projection of v onto W does 
not exceed the norm of v itself. This simple observation leads to the following important 
inequality. 


THEOREM 16 


The Cauchy-Schwarz Inequality 


For all u, v in V, 






2 See Statistics and Experimental Design in Engineering and the Physical Sciences , 2nd ed., by Norman 
L. Johnson and Fred C. Leone (New York: John Wiley & Sons, 1977). Tables there list “Orthogonal 
Polynomials,” which are simply the values of the polynomial at numbers such as —2, —1,0, 1, and 2. 
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PROOF Ifu = 0, then both sides of (4) are zero, and hence the inequality is true in this 
case. (See Practice Problem 1.) If u ^ 0, let W be the subspace spanned by u. Recall 


that lieu 


c 


u 


for any scalar c . Thus 


P ro J w v 


M 

(u,u) 


u 


M 

(u,u) 


u 


(v,u) 


u 


u 


<u,v) 


u 


Since 


P ro J w v 


< 


, we have 


<u,v> 


< 


u 


, which gives (4) 


The Cauchy-Schwarz inequality is useful in many branches of mathematics. A few 
simple applications are presented in the exercises. Our main need for this inequality here 
is to prove another fundamental inequality involving norms of vectors. See Figure 3. 


THEOREM 17 



FIGURE 3 

The lengths of the sides of a 
triangle. 


The Triangle Inequality 

For all u, v in V, 



PROOF 


u + v 


< u 


(u + V, u + v) = (u, u) + 2(u, v) + (v, v) 
2 + 2|(u,v)| + II v" 2 


< 


u 


( 


u 


+ 2||u 


+ II v 


+ 


Cauchy-Schwarz 



The triangle inequality follows immediately by taking square roots of both sides 


An Inner Product for C[a,b\ (Calculus required) 

Probably the most widely used inner product space for applications is the vector space 
C[a,b\ of all continuous functions on an interval a < t < b , with an inner product that 
we will describe. 

We begin by considering a polynomial p and any integer n larger than or equal 
to the degree of p . Then p is in P„, and we may compute a “length” for p using the 
inner product of Example 2 involving evaluation at n + 1 points in [a,b]. However, this 
length of p captures the behavior at only those n + 1 points. Since p is in P„ for all 
large n , we could use a much larger n , with many more points for the “evaluation” inner 
product. See Figure 4. 




FIGURE 4 Using different numbers of evaluation points in [a, b] to compute 

p 2 
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Let us partition [a, b] into n + 1 subintervals of length At = (b — a)/(n + 1), and 
let to ,..., t n be arbitrary points in these subintervals. 

—>| At K— 


a t 0 tj t n b 

If n is large, the inner product on P„ determined by to,, t n will tend to give a large 
value to ( p , p ), so we scale it down and divide by n + 1. Observe that l/(n + 1) = 
A t/(b — a), and define 


(p,q) 


l " 

—TT PdjWj) 

n + 1 ^ 

7=0 


1 


b 


a 


n 


TpitjM^At 


7=0 


Now, let n increase without bound. Since polynomials p and q are continuous functions, 
the expression in brackets is a Riemann sum that approaches a definite integral, and we 
are led to consider the average value of p(t)q(t) on the interval [a,b]: 


1 

b — a 



p(t)q(t) dt 


This quantity is defined for polynomials of any degree (in fact, for all continuous 
functions), and it has all the properties of an inner product, as the next example shows. 
The scale factor \/{b — a) is inessential and is often omitted for simplicity. 


EXAMPLE 7 For /, g in C[a, b], set 


(f,g) = f f(t)g(t)dt (5) 

J a 

Show that (5) defines an inner product on C [a, b]. 

SOLUTION Inner product Axioms 1-3 follow from elementary properties of definite 
integrals. For Axiom 4, observe that 

(/./)= f U(t)] 2 dt > 0 

J a 

The function [/( t)] 2 is continuous and nonnegative on [a, b]. If the definite integral of 
[/(0] 2 is zero, then [/( t)] 2 must be identically zero on [a, b ], by a theorem in advanced 
calculus, in which case / is the zero function. Thus (/, /) = 0 implies that / is the 
zero function on [a , b\. So (5) defines an inner product on C [a , b\. ■ 

EXAMPLE 8 Let V be the space C [0,1] with the inner product of Example 7, and 
let W be the subspace spanned by the polynomials p\(t) = 1, pi(t) =2t — and 
Pi(t) = 12 1 2 . Use the Gram-Schmidt process to find an orthogonal basis for W . 

SOLUTION Letgi = p\ , and compute 

= [ (2t — 1 )( 1 ) dt = (t 2 — t) 


iP2,q\) 


o 


0 


0 
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So p 2 is already orthogonal to q i, and we can take q 2 = Pi • For the projection of p 3 
onto W 2 = Span {qi, q 2 }, compute 


Then 


and 


(P 3,qi) 



1 1 

lit 2 ■ 1 dt = 


At 


0 


4 


0 


(quqi) 



1 


1-1 dt = t 


0 


1 


1 


0 


(P3,^2> 



1 /"l 

2/-i. n /i ^3 


(<72,42) 


12/ (2/ - 1) dt = / (24/ 

0 J 0 

1 1 
(2/ - 1 ) 2 dt = -(It - l) 3 

0 6 


12/ 2 ) dt = 2 



1 


0 


1 

3 


(P 3 ,q\) (# 3 , 22 ) 4 2 

P r °j w 2 Pi = 7 -r2i + 7 - {<72 = t2i + 77722 = 4(^1 + 642 

(2i,2i) (22,22) 1 1/3 


q 3 = p 3 - p r oj p3 = p 3 - 4 q x - 6q 2 


As a function, g 3 (^) = 12 ^ 2 — 4 — 6(2^ — 1) = 12 ^ 2 — 12£ + 2. The orthogonal basis 
for the subspace W is {# 1 , q 2 , q 2 }. 


PRACTICE PROBLEMS 

Use the inner product axioms to verify the following statements. 

1. (v, 0} = (0, v) = 0. 

2. (u, v + w) = (u, v) + (u, w). 


6.7 EXERCISES 

1. Let IR 2 have the inner product of Example 1, and let 
x = (1,1) andy = (5,-1). 

a. Find ||x||, ||y||, and |(x,y)| 2 . 

b. Describe all vectors (zi, z 2 ) that are orthogonal to y. 

2. Let IR 2 have the inner product of Example 1. Show that 
the Cauchy-Schwarz inequality holds for x = (3,-2) and 
y = (-2,1). [Suggestion: Study |(x,y)| 2 .] 

Exercises 3-8 refer to P 2 with the inner product given by evalua¬ 
tion at —1,0, and 1. (See Example 2.) 

3. Compute (p, q), where p(t) = 4 + t,q(t) = 5 — 4 1 2 . 

4. Compute (p, q), where p{t) = 3t — t 2 ,q(t) = 3 + 2 1 2 . 

5. Compute ||/?|| and ||g||, for p and q in Exercise 3. 

6. Compute ||/?|| and ||g||, for p and q in Exercise 4. 

7. Compute the orthogonal projection of q onto the subspace 
spanned by p, for p and q in Exercise 3. 

8. Compute the orthogonal projection of q onto the subspace 
spanned by p, for p and q in Exercise 4. 


9. Let P 3 have the inner product given by evaluation at — 3, — 1, 
1, and 3. Let po(t) = 1, pi(t) = t , and p 2 (t) = t 2 . 

a. Compute the orthogonal projection of p 2 onto the sub¬ 
space spanned by p 0 and p\. 

b. Find a polynomial q that is orthogonal to p 0 and 
Pi, such that {po,P\,q} is an orthogonal basis for 
Span {p 0 , p\, p 2 }. Scale the polynomial q so that its vec¬ 
tor of values at (—3, —1,1,3) is (1, —1, —1,1). 

10. Let P 3 have the inner product as in Exercise 9, with p 0 , p { , 
and q the polynomials described there. Find the best approx¬ 
imation to p(t) = t 3 by polynomials in Span {p 0 , p\,q}. 

11. Let po, pi, and p 2 be the orthogonal polynomials described 
in Example 5, where the inner product on P 4 is given by eval¬ 
uation at —2, —1,0, 1, and 2. Find the orthogonal projection 
of t 3 onto Span {/? 0 , p 1, p 2 }. 

12. Find a polynomial p 3 such that {po, P\, P 2 , P 3 } (see Ex¬ 
ercise 11) is an orthogonal basis for the subspace P 3 of 
P 4 . Scale the polynomial p 3 so that its vector of values is 
(- 1 , 2 , 0 ,- 2 , 1 ). 
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13. Let A be any invertible n x n matrix. Show that for u, v in 
R 7 \ the formula (u, v) = (Au)* (Ay) = (Au) T (A y) defines 
an inner product on R 71 . 

14. Let T be a one-to-one linear transformation from a vector 
space V into ML Show that for u, v in V, the formula 
(u, v) = T(u)-T (v) defines an inner product on V. 


Use the inner product axioms and other results of this section to 
verify the statements in Exercises 15-18. 

15. (u, cy) = c (u, v) for all scalars c . 

16. If {u, v} is an orthonormal set in V, then ||u — v|| = \/2. 

17. (u, v) = !||u + v 


18. u + vU + u — v = 2 u “ + 2 v 


i 

4 

2 _ 


u — v 


19. Given a > 0 and b > 0, let u = 



and v = 


\[b 



a 


Use the Cauchy-Schwarz inequality to compare the geomet 
ric mean Vab with the arithmetic mean (a -\-b)/2. 


20. Let u = 


a 

b 


and v = 


equality to show that 


1 

1 


. Use the Cauchy-Schwarz in 


a + b 


< 


a 2 + b 2 


Exercises 21-24 refer to V = C[ 0,1], with the inner product 

given by an integral, as in Example 7. 

21. Compute (f g), where f(t) = 1 — 3 1 2 and g(t) = t — t 3 . 

22. Compute (f g), where f(t) = 5t — 3 and g(7) = t 3 — t 2 . 

23. Compute \\f\\ for / in Exercise 21. 

24. Compute ||g|| for g in Exercise 22. 

25. Let V be the space C[— 1,1] with the inner product of Ex¬ 
ample 7. Find an orthogonal basis for the subspace spanned 
by the polynomials 1, t, and t 2 . The polynomials in this basis 
are called Legendre polynomials. 

26. Let V be the space C [—2, 2] with the inner product of Exam¬ 
ple 7. Find an orthogonal basis for the subspace spanned by 
the polynomials 1, t, and t 2 . 

27. [M] Let P 4 have the inner product as in Example 5, and let 
Po, P i , Pi be the orthogonal polynomials from that example. 
Using your matrix program, apply the Gram-Schmidt proc¬ 
ess to the set {p 0 , p\, p 2 ,t 3 ,t 4 } to create an orthogonal basis 
for P 4 . 

28. [M] Let V be the space C[0,2jt] with the inner prod¬ 
uct of Example 7. Use the Gram-Schmidt process to 
create an orthogonal basis for the subspace spanned by 
{1, cos t, cos 2 1, cos 3 1}. Use a matrix program or computa¬ 
tional program to compute the appropriate definite integrals. 


SOLUTIONS TO PRACTICE PROBLEMS 

1. By Axiom 1, (v, 0) = (0,v). Then (0,v) = (Ov, v) = 0(v, v), by Axiom 3, so 
(0, v) = 0 . 

2. By Axioms 1, 2, and then 1 again, (u, v + w) = (v + w, u) = (v, u) + (w, u) = 
(u, v) + (u,w). 


6.8 APPLICATIONS OF INNER PRODUCT SPACES 


The examples in this section suggest how the inner product spaces defined in Section 6.7 
arise in practical problems. The first example is connected with the massive least- 
squares problem of updating the North American Datum, described in the chapter’s 
introductory example. 


Weighted Least-Squares 

Let y be a vector of n observations, jq ,,y n , and suppose we wish to approximate y by 
a vector y that belongs to some specified subspace of M 7? . (In Section 6.5, y was written 
as Ax so that y was in the column space of A.) Denote the entries in y by jq,..., y n . 
Then the sum of the squares for error , or SS(E), in approximating y by y is 

ss(E) = ( yi -h ) 2 + --- + (y n -yn ) 2 


This is simply ||y — y|| 2 , using the standard length in M 


n 


(i) 
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Now suppose the measurements that produced the entries in y are not equally 
reliable. (This was the case for the North American Datum, since measurements were 
made over a period of 140 years.) As another example, the entries in y might be 
computed from various samples of measurements, with unequal sample sizes.) Then 
it becomes appropriate to weight the squared errors in (1) in such a way that more 
importance is assigned to the more reliable measurements. 1 If the weights are denoted 
by then the weighted sum of the squares for error is 

Weighted SS(E) = w 2 (yi - yi) 2 H-f w 2 (y„ - y„) 2 (2) 

This is the square of the length of y — y, where the length is derived from an inner 
product analogous to that in Example 1 in Section 6.7, namely, 

(x,y) = w 2 xiyi H-b w 2 x n y„ 


It is sometimes convenient to transform a weighted least-squares problem into an 
equivalent ordinary least-squares problem. Let W be the diagonal matrix with (positive) 
Wi,... ,w n on its diagonal, so that 



0 


y i 


my i 

• 


L2 

• 

— 

w 2 y2 

• 

# 

• 

W n _ 


• 

• 

_ y n _ 


• 

• 

_ W n y n _ 


with a similar expression for W y. Observe that the j th term in (2) can be written as 


w 2 i(yj - yj ) 2 = (wjyj - Wifi) 


It follows that the weighted SS(E) in (2) is the square of the ordinary length in M /? of 
Wy — Wy, which we write as || Wy — TEy|| 2 . 

Now suppose the approximating vector y is to be constructed from the columns of 
a matrix A. Then we seek an x that makes Ax = y as close to y as possible. However, 
the measure of closeness is the weighted error, 


\\Wy-Wy\\ 


\\Wy — WAx\\ 


Thus x is the (ordinary) least-squares solution of the equation 


WAx = Wy 


The normal equation for the least-squares solution is 


{WA) T WAx = (WA) T Wy 


EXAMPLE 1 Find the least-squares line y = /?o + fi\x that best fits the data 
(—2, 3), (—1, 5), (0, 5), (1,4), and (2, 3). Suppose the errors in measuring the y-values 
of the last two data points are greater than for the other points. Weight these data half as 
much as the rest of the data. 


1 Note for readers with a background in statistics: Suppose the errors in measuring the y,- are independent 
random variables with means equal to zero and variances of of,..., cr^. Then the appropriate weights in (2) 

are wf = 1 /ay. The larger the variance of the error, the smaller the weight. 
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y 


y - 4.3 + 2x 



2 - 


y = 4 - Ax 


x 


-2 

FIGURE 1 

Weighted and ordinary 
least-squares lines. 


2 


SOLUTION As in Section 6.6, write X for the matrix A and P for the vector x, and 
obtain 


X 


— 1 

-2" 




i 

_ 1 

1 

-1 




5 

1 

0 

, p = 

1 

H- O 

1 _ 

> y = 

5 

1 

1 



4 

1 — 

2 




i 

1 _ 


For a weighting matrix, choose W with diagonal entries 2, 2, 2, 1, and 1. Left 
multiplication by W scales the rows of X and y: 


WX 


<N 

1_ 

-4" 


1 

_1 

2 

-2 


10 

2 

0 

, Wy = 

10 

1 

1 


4 

i— 

2 


i 

1_ 


For the normal equation, compute 


(WX) T WX 


14 

-9 


-9 

25 


and (WX) T Wy 


59 

34 


and solve 


" 14 

-9" 


Po 


59" 

-9 

25 


Jl. 


-34 


The solution of the normal equation is (to two significant digits) /?o 
The desired line is 

y = 4.3 + .20x 

In contrast, the ordinary least-squares line for these data is 

y = 4.0 — .10x 

Both lines are displayed in Figure 1. 


4.3 and p 


l 


.20 


Trend Analysis of Data 

Let / represent an unknown function whose values are known (perhaps only approx¬ 
imately) at to, ..., t n . If there is a “linear trend” in the data / (to), ..., f(t n ), then we 
might expect to approximate the values of / by a function of the form po + Pit. 
If there is a “quadratic trend” to the data, then we would try a function of the form 
Po + Pit + Pit 2 - This was discussed in Section 6 . 6 , from a different point of view. 

In some statistical problems, it is important to be able to separate the linear trend 
from the quadratic trend (and possibly cubic or higher-order trends). For instance, 
suppose engineers are analyzing the performance of a new car, and f(t) represents 
the distance between the car at time t and some reference point. If the car is traveling 
at constant velocity, then the graph of f(t ) should be a straight line whose slope is the 
car’s velocity. If the gas pedal is suddenly pressed to the floor, the graph of f(t) will 
change to include a quadratic term and possibly a cubic term (due to the acceleration). 
To analyze the ability of the car to pass another car, for example, engineers may want 
to separate the quadratic and cubic components from the linear term. 

If the function is approximated by a curve of the form y = /3 0 + Pit + Pit 2 , the 
coefficient /?2 may not give the desired information about the quadratic trend in the data, 
because it may not be “independent” in a statistical sense from the other Pi . To make 
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what is known as a trend analysis of the data, we introduce an inner product on the 
space P„ analogous to that given in Example 2 in Section 6.7. For p,q in P„, define 

(p,q) = p(t 0 )q(to ) H-H p(t„)q(t „) 

In practice, statisticians seldom need to consider trends in data of degree higher than 
cubic or quartic. So let po, p \, p2, p?> denote an orthogonal basis of the subspace P3 of 
P /7 , obtained by applying the Gram-Schmidt process to the polynomials 1, t, t 2 , and t 3 . 
By Supplementary Exercise 11 in Chapter 2, there is a polynomial g in P„ whose values 
at to ,..., t n coincide with those of the unknown function /. Let g be the orthogonal 
projection (with respect to the given inner product) of g onto P3, say, 

g = C0P0 + cipi + C2P2 + c 3 p 3 

Then g is called a cubic trend function, and Co, ... ,03 are the trend coefficients of 
the data. The coefficient C\ measures the linear trend, C2 the quadratic trend, and c 3 the 
cubic trend. It turns out that if the data have certain properties, these coefficients are 
statistically independent. 

Since po,..., p 3 are orthogonal, the trend coefficients may be computed one 
at a time, independently of one another. (Recall that q = {g, Pi)/{pi, Pi)•) We can 
ignore p 3 and c 3 if we want only the quadratic trend. And if, for example, we needed 
to determine the quartic trend, we would have to find (via Gram-Schmidt) only a 
polynomial p\ in P4 that is orthogonal to P3 and compute (g, P 4 )/(p 4 , Pa) • 

EX A M P L E 2 The simplest and most common use of trend analysis occurs when the 
points to,, t n can be adjusted so that they are evenly spaced and sum to zero. Fit a 
quadratic trend function to the data (—2, 3), (—1,5), (0, 5), (1,4), and (2, 3). 

SOLUTION The ^-coordinates are suitably scaled to use the orthogonal polynomials 
found in Example 5 of Section 6.7: 


Polynomial: 

Po 

P 1 

P2 

Data: 


1 


-2 


2 


3 


1 


-1 


-1 


5 

Vector of values: 

1 

9 

0 

9 

-2 

5 

5 


1 


1 


-1 


4 


1 


2 


2 


3 


y 



FIGURE 2 

Approximation by a quadratic 
trend function. 


The calculations involve only these vectors, not the specific formulas for the orthogonal 
polynomials. The best approximation to the data by polynomials in P 2 is the orthogonal 
projection given by 







(g. Pi) 

{P2,Pl) 



and 

p(t) = 4-At - .5(t 2 -2) (3) 

Since the coefficient of P 2 is not extremely small, it would be reasonable to conclude 
that the trend is at least quadratic. This is confirmed by the graph in Figure 2. ■ 
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Fourier Series (Calculus required) 

Continuous functions are often approximated by linear combinations of sine and cosine 
functions. For instance, a continuous function might represent a sound wave, an electric 
signal of some type, or the movement of a vibrating mechanical system. 

For simplicity, we consider functions on 0 < t < 2n . It turns out that any function 
in C[0, 2n] can be approximated as closely as desired by a function of the form 


a 0 

T 


+ a\ cos t + • • • + a n cos nt + b\ sin t + • • • + b n sinnt 


(4) 


for a sufficiently large value of n . The function (4) is called a trigonometric poly¬ 
nomial. If a n and b n are not both zero, the polynomial is said to be of order n. The 
connection between trigonometric polynomials and other functions in C [0, 2n] depends 
on the fact that for any n > 1, the set 


{1, cos t, cos 2t, ..., cos nt, sint, sin2L ..., sin nt} 


(5) 


is orthogonal with respect to the inner product 


(f,g) 



2n 


f(t)g(t)dt 


( 6 ) 


0 


This orthogonality is verified as in the following example and in Exercises 5 and 6. 


EXAMPLE 3 Let C [0, 2tt] have the inner product (6), and let m and n be unequal 
positive integers. Show that cos mt and cos nt are orthogonal. 

SOLUTION Use a trigonometric identity. When m ^ n. 


(cos mt, cos nt) 



In 


cos mt cos nt dt 


o 


1 

2 

1 

2 



In 


[cos (mt + nt) + cos (mt — nt)] dt 


o 


sin(m£ + nt) sin (mt — nt) 


m + n 


m — n 


2n 


0 


o 


Let W be the subspace of C[0, 2n] spanned by the functions in (5). Given / in 
C [0, 2tt] , the best approximation to / by functions in W is called the wth-order Fourier 
approximation to / on [0,2zr]. Since the functions in (5) are orthogonal, the best 
approximation is given by the orthogonal projection onto W . In this case, the coefficients 
ak and bk in (4) are called the Fourier coefficients of /. The standard formula for an 
orthogonal projection shows that 


ak 


(/, cos kt) 
(cos kt, cos kt) ’ 


bk 


(/, sin kt) 
(sin kt, sin kt) ’ 


k > 1 


Exercise 7 asks you to show that (cos kt, cos kt) = n and (sinkt, sin kt) = 7r.Thus 


^ p2n ^ P2n 

cik — — / f(t) cos kt dt, bk = — fit) sinkt dt 

ft Jo ft Jo 

The coefficient of the (constant) function 1 in the orthogonal projection is 


(7) 


(/, i) 
(i,i) 


i 


2tt 



2n 


fit)-i dt 


0 


1 

2 


1 


Tt 



2n 


f f) cos(0 •t) dt 


o 


do 

2 


where ao is defined by (7) for k — 0. This explains why the constant term in (4) is written 
as <zo/2. 
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EXAMPLE 4 Find the nth-order Fourier approximation to the function fit) = t on 
the interval [0, 2n ]. 

SOLUTION Compute 


a 0 

ii 

• 

^ * 

ii 

1 

1 2 

- 1 

2 71 

2 

2 7T Jo 

2 TV 

2 

0 


and for k > 0, using integration by parts, 



1 

p2jT 

1 

r i 

cos kt 

t 

Z7T 


a k = 


/ t cos kt dt - 


_k 2 

+ — sin k t 


= 0 

Tt 

Jo 

jt 


k 

0 



1 

p2jz 

1 

" 1 


t 1 

2 71 

2 

bk = 

/ t sinkt dt = 

sin kt - 

- — cos kt 



It 

Jo 

Tt 

k 2 


k J 

0 

k 


Thus the nth-order Fourier approximation of f(t) = t is 

2 2 

jt — 2 sin t — sin 2 1 -sin 3t --sin n t 

3 n 

Figure 3 shows the third- and fourth-order Fourier approximations of /. ■ 



t 



t 


(a) Third order 


(b) Fourth order 


FIGURE 3 Fourier approximations of the function = 


The norm of the difference between / and a Fourier approximation is called the 
mean square error in the approximation. (The term mean refers to the fact that the norm 
is determined by an integral.) It can be shown that the mean square error approaches 
zero as the order of the Fourier approximation increases. For this reason, it is common 
to write 

(X) 

a 0 ^—> 

f{t) =-b y ( a m cos mt + b m sin mt ) 

m = 1 

This expression for f(t ) is called the Fourier series for / on [0,2 jt]. The term 
a m cos mt, for example, is the projection of / onto the one-dimensional subspace 
spanned by cos m t . 

PRACTICE PROBLEMS 

1. Let q\{t) = 1, qi(t) = t, and q^{t) = 3 1 1 — 4. Verify that {q\,q 2 , qs} is an orthog¬ 
onal set in C [—2, 2] with the inner product of Example 7 in Section 6.7 (integration 
from —2 to 2 ). 

2. Find the first-order and third-order Fourier approximations to 

f(t) = 3 — 2 sin t + 5 sin 2 1 — 6 cos 2 1 
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6.8 EXERCISES 

1. Find the least-squares line y = /3 0 + fi\X that best fits the 
data (—2,0), (— 1,0), (0,2), (1,4), and (2,4), assuming that 
the first and last data points are less reliable. Weight them half 
as much as the three interior points. 

2. Suppose 5 out of 25 data points in a weighted least-squares 
problem have a y -measurement that is less reliable than the 
others, and they are to be weighted half as much as the other 
20 points. One method is to weight the 20 points by a factor 
of 1 and the other 5 by a factor of \. A second method is 
to weight the 20 points by a factor of 2 and the other 5 by 
a factor of 1. Do the two methods produce different results? 
Explain. 

3. Fit a cubic trend function to the data in Example 2. The 
orthogonal cubic polynomial is p 2 f) = |f 3 — 

4. To make a trend analysis of six evenly spaced data points, one 
can use orthogonal polynomials with respect to evaluation at 
the points t = —5, —3, —1,1, 3, and 5. 

a. Show that the first three orthogonal polynomials are 

p 0 (t) = 1, pi(t) = t, and p 2 (t) = \t 2 - f 

(The polynomial p 2 has been scaled so that its values at 
the evaluation points are small integers.) 

b. Fit a quadratic trend function to the data 
(-5,1), (-3,1), (-1, 4), (1, 4), (3, 6), (5, 8) 

In Exercises 5-14, the space is C [0,27r] with the inner product (6). 

5. Show that sin mt and sin nt are orthogonal when m n. 

6. Show that sin mt and cos nt are orthogonal for all positive 
integers m and n . 

7. Show that || cos kt\\ 2 = n and || sinkt || 2 = it for k > 0. 

8. Find the third-order Fourier approximation to f(t) = t — 1 . 


9. Find the third-order Fourier approximation to fit) = 
2 tc — t. 

10. Find the third-order Fourier approximation to the square 
wave function f(t) = 1 for 0 < t < n and f(t) = —1 for 

71 < t < 271 . 

11. Find the third-order Fourier approximation to sin" t , without 
performing any integration calculations. 

12. Find the third-order Fourier approximation to cos 3 1 , without 
performing any integration calculations. 

13. Explain why a Fourier coefficient of the sum of two functions 
is the sum of the corresponding Fourier coefficients of the 
two functions. 


14. Suppose the first few Fourier coefficients of some function 
/ in C[0,27r] are a 0 , a\, a 2 , and b\, b 2 , b 2 . Which of the 
following trigonometric polynomials is closer to / ? Defend 
your answer. 


a o 

git) =- b a\ cos t + a 2 cos 2 1 + b\ sin t 

ao 

hit) = — a i cos t + a 2 cos 2 1 + b\ sin t + b 2 sin 2 1 


15. [M] Refer to the data in Exercise 13 in Section 6.6, con¬ 
cerning the takeoff performance of an airplane. Suppose the 
possible measurement errors become greater as the speed of 
the airplane increases, and let W be the diagonal weighting 
matrix whose diagonal entries are 1,1,1, .9, .9, .8, .7, .6, .5, 
.4, .3, .2, and .1. Find the cubic curve that fits the data with 
minimum weighted least-squares error, and use it to estimate 
the velocity of the plane when t — 4.5 seconds. 


16. [M] Fet / 4 and f 5 be the fourth-order and fifth-order Fourier 
approximations in C[0,2tt] to the square wave function in 
Exercise 10. Produce separate graphs of / 4 and / 5 on the 
interval [0, 2 tt] , and produce a graph of f 5 on [—271, 2 tt] . 



The Linearity of an Orthogonal Projection 6-25 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Compute 


(<7i>42) 



1 *t dt 





-2 

2 


1 

-t 

2 


0 


-2 


1 • ( 3 1 2 — 4 ) dt = it 3 — At) 


{<l2,q3) 



-2 

2 


0 


-2 


t • ( 3 1 2 — 4 ) dt 


-2 


3 

-r- 2 t 

4 


4 rx.2 


0 


-2 
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y 



First- and third-order 
approximations to /( t ). 


2. The third-order Fourier approximation to / is the best approximation in C [0, 2jt] 
to / by functions (vectors) in the subspace spanned by 1, cos t, cos 2 t, cos 3 t, 
sin A sin20 and sin3C But / is obviously in this subspace, so / is its own best 
approximation: 

fit) = 3 — 2 sin t + 5 sin 2 1 —6 cos 2 1 

For the first-order approximation, the closest function to / in the subspace W = 
Span{l, cos t,sint} is 3 — 2 sin t. The other two terms in the formula for / (t) are 
orthogonal to the functions in IT, so they contribute nothing to the integrals that 
give the Fourier coefficients for a first-order approximation. 


CHAPTER 6 SUPPLEMENTARY EXERCISES 


1. The following statements refer to vectors in (or W n ) 
with the standard inner product. Mark each statement True 
or False. Justify each answer. 

a. The length of every vector is a positive number. 

b. A vector v and its negative, — v, have equal lengths. 

c. The distance between u and v is ||u — v||. 

d. If r is any scalar, then ||rv|| = r||v||. 

e. If two vectors are orthogonal, they are linearly indepen¬ 
dent. 


f. 

g- 

h. 

i. 


If x is orthogonal to both u and v, then x must be 
orthogonal to u — v. 


If ||u + v || 2 = u 2 + v | 2 , then u and v are orthogonal 
If || u — v|| 2 = || u || 2 + ||v|| 2 , then u and v are orthogonal 


The orthogonal projection of y onto u is a scalar multiple 
of y. 


J- 

k. 

l. 


If a vector y coincides with its orthogonal projection onto 
a subspace W , then y is in W . 


The set of all vectors in IT orthogonal to one fixed vector 
is a subspace of 



If IT is a subspace of IT, then W and W 1 - have no 
vectors in common. 


m. If {vi, v 2 , v 3 } is an orthogonal set and if c \, c 2 , and c 3 are 
scalars, then {ciVi, c 2 v 2 , c 3 v 3 } is an orthogonal set. 

n. If a matrix U has orthonormal columns, then UU T = I. 

o. A square matrix with orthogonal columns is an orthogo¬ 
nal matrix. 


p. If a square matrix has orthonormal columns, then it also 
has orthonormal rows. 



If IT is a subspace, then || proj^ v 

II II2 


+ ||v — proj^/ v 


2 _ 


r. A least-squares solution of Ax = b is the vector Ax in 
ColT closest to b, so that ||b — Ax || < ||b — Tx|| for 
all x. 

s. The normal equations for a least-squares solution of 
Ax = b are given by x = (A T A)~ l A T b. 

2. Let {vi,..., \ p ) be an orthonormal set. Verify the following 
equality by induction, beginning with p = 2. If x = C 1 V 1 + 
-b c p \ p , then 



3. Let {vi,..., } be an orthonormal set in IT. Verify the 

following inequality, called Bessel’s inequality , which is true 
for each x in IT: 


x 


> 


X*Vi 2 + x*v 2 2 + • • • + x*v 


p 


4. Let U be an n xn orthogonal matrix. Show that if 
{vi,...,v n } is an orthonormal basis for IT , then so is 

{U\ i,..., U\ n ). 

5. Show that if an n x n matrix U satisfies ( Ux) • (U y) = x*y 
for all x and y in IT , then U is an orthogonal matrix. 

6. Show that if U is an orthogonal matrix, then any real eigen¬ 
value of U must be =b 1. 

7. A Householder matrix , or an elementary reflector, has the 
form Q = I — 2uu t where u is a unit vector. (See Exer¬ 
cise 13 in the Supplementary Exercises for Chapter 2.) Show 
that Q is an orthogonal matrix. (Elementary reflectors are of¬ 
ten used in computer programs to produce a QR factorization 
of a matrix A. If A has linearly independent columns, then 
left-multiplication by a sequence of elementary reflectors can 
produce an upper triangular matrix.) 
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8. Let T : W 1 -> R n be a linear transformation that preserves 
lengths; that is, ||7\x)|| = ||x|| for all x in R". 

a. Show that T also preserves orthogonality; that is, 
T(x)-T (y) = 0 whenever x*y = 0. 

b. Show that the standard matrix of T is an orthogonal 
matrix. 

9. Let u and v be linearly independent vectors in R n that are 
not orthogonal. Describe how to find the best approximation 
to z in R" by vectors of the form x\U + x 2 v without first 
constructing an orthogonal basis for Span {u, v}. 

10. Suppose the columns of A are linearly independent. Deter¬ 
mine what happens to the least-squares solution x of Ax = b 
when b is replaced by cb for some nonzero scalar c . 

11. If a , b, and c are distinct numbers, then the following 
system is inconsistent because the graphs of the equations 
are parallel planes. Show that the set of all least-squares 
solutions of the system is precisely the plane whose equation 
is x — 2 y + 5z = (a + b + c)/3. 

x — 2y + 5z — a 

x — 2y + 5z —b 
x — 2 y + 5z — c 

12. Consider the problem of finding an eigenvalue of an n x n 
matrix A when an approximate eigenvector v is known. Since 
v is not exactly correct, the equation 

Ay = Av (1) 


14. Explain why an equation Ax = b has a solution if and only 
if b is orthogonal to all solutions of the equation A T x = 0. 


Exercises 15 and 16 concern the (real) Schur factorization of an 
kxk matrix A in the form A = URU T , where U is an orthogonal 
matrix and R is an n x n upper triangular matrix. 1 

15. Show that if A admits a (real) Schur factorization, A = 
URU T , then A has n real eigenvalues, counting multiplic¬ 
ities. 


16. Let A be an n x n matrix with n real eigenvalues, counting 
multiplicities, denoted by X\,... ,X n . It can be shown that 
A admits a (real) Schur factorization. Parts (a) and (b) show 
the key ideas in the proof. The rest of the proof amounts to 
repeating (a) and (b) for successively smaller matrices, and 
then piecing together the results. 

a. Let Ui be a unit eigenvector corresponding to Ai, let 
u 2 ,... ,u n be any other vectors such that {ui,..., u n } 
is an orthonormal basis for and then let U = 
[ui u 2 ••• u n ]. Show that the first column of 
U T AU is Aiei, where ei is the first column of the n x n 
identity matrix. 

b. Part (a) implies that U T AU has the form shown below. 
Explain why the eigenvalues of A\ are A 2 ,... , X n . [Hint: 
See the Supplementary Exercises for Chapter 5.] 


U T AU 







will probably not have a solution. However, A can be esti¬ 
mated by a least-squares solution when (1) is viewed prop¬ 
erly. Think of v as an n x 1 matrix V, think of A as a 
vector in R 1 , and denote the vector Ay by the symbol b. 
Then (1) becomes b = XV, which may also be written as 
VX = b. Find the least-squares solution of this system of 
n equations in the one unknown A, and write this solution 
using the original symbols. The resulting estimate for A 
is called a Rayleigh quotient. See Exercises 11 and 12 in 
Section 5.8. 

13. Use the steps below to prove the following relations among 
the four fundamental subspaces determined by an m x n 
matrix A. 

Row A = (Nul A Col A = (Nul A 1 ') 1 - 


[M] When the right side of an equation Ax = b is changed 
slightly— say, to Ax = b + Ab for some vector Ab— the solution 
changes from x to x + Ax, where Ax satisfies A (Ax) = Ab. 
The quotient ||Ab||/||b|| is called the relative change in b (or 
the relative error in b when Ab represents possible error in the 
entries of b). The relative change in the solution is || Ax||/||x||. 
When A is invertible, the condition number of A, written as 
cond(A), produces a bound on how large the relative change in 
x can be: 



< cond(y4)* 




In Exercises 17-20, solve Ax = b and A (Ax) = Ab, and show 
that the inequality (2) holds in each case. (See the discussion of 
ill-conditioned matrices in Exercises 41-43 in Section 2.3.) 


a. Show that Row A is contained in (Nul A)^. (Show that 
if x is in Row A, then x is orthogonal to every u in 
Nul A .) 

b. Suppose rank A — r . Find dim Nul A and dim (Nul A ) 1 -, 
and then deduce from part (a) that Row A = (Nul A)^. 
[Hint: Study the exercises for Section 6.3.] 

c. Explain why Col A = (Nul A 7 ) 1 -. 


17. A = 

in 
• • 

1_1 

3.1" 

1.1 

,b = 

" 19.249" 
6.843 

, Ab = 

1 1 

• • 

o o 
o o 

1_1 


1 If complex numbers are allowed, every n x n matrix A admits a 
(complex) Schur factorization, A = URU ~ l , where R is upper triangular 
and U ~ 1 is the conjugate transpose of U . This very useful fact is discussed 
in Matrix Analysis, by Roger A. Horn and Charles R. Johnson (Cambridge: 
Cambridge University Press, 1985), pp. 79-100. 
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18. A = 


19. A = 


4.5 

1.6 

3.1" 

Li_ 

,b = 


.500" 
1.407 _ 

, Ab = 

7 

-6 

-4 

1 " 


.100 

-5 

1 

0 - 

-2 

,b = 

2.888 

10 

11 

7 - 

-3 

-1.404 

19 

9 

7 

1 


1.462 


.001 

-.003 


_ i n —4 


Ab = 10 


.49 

-1.28 

5.78 

8.04 


20. A = 


7 

-6 

-4 

1 

-5 

1 

0 

-2 

10 

11 

7 

-3 

19 

9 

7 

1 



4.230" 

,b = 

-11.043 

49.991 


69.536 



.27 

7.76 

-3.77 


3.93 





















Symmetric Matrices 
and Quadratic Forms 


INTRODUCTORY EXAMPLE 

Multichannel Image Processing 



Around the world in little more than 80 minutes , the two 
Landsat satellites streak silently across the sky in near 
polar orbits, recording images of terrain and coastline, in 
swaths 185 kilometers wide. Every 16 days, each satellite 
passes over almost every square kilometer of the earth’s 
surface, so any location can be monitored every 8 days. 

The Landsat images are useful for many purposes. 
Developers and urban planners use them to study the rate 
and direction of urban growth, industrial development, and 
other changes in land usage. Rural countries can analyze 
soil moisture, classify the vegetation in remote regions, and 
locate inland lakes and streams. Governments can detect 
and assess damage from natural disasters, such as forest 
fires, lava flows, floods, and hurricanes. Environmental 
agencies can identify pollution from smokestacks and 
measure water temperatures in lakes and rivers near power 
plants. 

Sensors aboard the satellite acquire seven simul¬ 
taneous images of any region on earth to be studied. The 
sensors record energy from separate wavelength bands — 
three in the visible light spectrum and four in infrared and 
thermal bands. Each image is digitized and stored as a 
rectangular array of numbers, each number indicating the 
signal intensity at a corresponding small point (or pixel ) 


on the image. Each of the seven images is one channel of 
a multichannel or multispectral image. 

The seven Landsat images of one fixed region typically 
contain much redundant information, since some features 
will appear in several images. Yet other features, because of 
their color or temperature, may reflect light that is recorded 
by only one or two sensors. One goal of multichannel 
image processing is to view the data in a way that extracts 
information better than studying each image separately. 

Principal component analysis is an effective way 
to suppress redundant information and provide in only 
one or two composite images most of the information 
from the initial data. Roughly speaking, the goal is to 
find a special linear combination of the images, that is, 
a list of weights that at each pixel combine all seven 
corresponding image values into one new value. The 
weights are chosen in a way that makes the range of light 
intensities—the scene variance — in the composite image 
(called the first principal component) greater than that in 
any of the original images. Additional component images 
can also be constructed, by criteria that will be explained 
in Section 7.5. 
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Principal component analysis is illustrated in the 
photos below, taken over Railroad Valley, Nevada. Images 
from three Landsat spectral bands are shown in (a)-(c). 
The total information in the three bands is rearranged 
in the three principal component images in (d)-(f). The 
first component (d) displays (or “explains”) 93.5% of the 
scene variance present in the initial data. In this way, the 
three-channel initial data have been reduced to one-channel 


data, with a loss in some sense of only 6.5% of the scene 
variance. 

Earth Satellite Corporation of Rockville, Maryland, 
which kindly supplied the photos shown here, is 
experimenting with images from 224 separate spectral 
bands. Principal component analysis, essential for such 
massive data sets, typically reduces the data to about 15 
usable principal components. 

WEB 





(a) Spectral band 1: Visible blue. 


(b) Spectral band 4: Near infrared. 


(c) Spectral band 7: Mid-infrared. 




(d) Principal component 1: 93.5%. 


(e) Principal component 2: 5.3%. 


(f) Principal component 3: 1.2%. 


Symmetric matrices arise more often in applications, in one way or another, than any 
other major class of matrices. The theory is rich and beautiful, depending in an essential 
way on both diagonalization from Chapter 5 and orthogonality from Chapter 6. The 
diagonalization of a symmetric matrix, described in Section 7.1, is the foundation for 
the discussion in Sections 7.2 and 7.3 concerning quadratic forms. Section 7.3, in turn, is 
needed for the final two sections on the singular value decomposition and on the image 
processing described in the introductory example. Throughout the chapter, all vectors 
and matrices have real entries. 
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7.1 DIAGONALIZATION OF SYMMETRIC MATRICES 


A symmetric matrix is a matrix A such that A T = A . Such a matrix is necessarily square. 
Its main diagonal entries are arbitrary, but its other entries occur in pairs —on opposite 
sides of the main diagonal. 


EXAM PLE 1 Of the following matrices, only the first three are symmetric: 


Symmetric: 


Nonsymmetric: 


1 0 
0 -3 


1 -3 
3 0 


0 

-1 

0 


-1 

5 

00 

5 

0 

00 

-7 


1 

-4 

0 " 

1— 

-6 

1 

-4 

5 

0 

-6 

1 



a 

b 

c 


5 

4 

3 


b 

d 


4 

3 

2 


c 

e 

f 


3 

2 

1 


2 

1 

0 


To begin the study of symmetric matrices, it is helpful to review the diagonalization 
process of Section 5.3. 


EXAM PLE 2 If possible, diagonalize the matrix A 


6 

2 

1 


2 -1 


6 

1 


1 

5 


SOLUTION The characteristic equation of A is 


0 


A 3 + 17A 2 - 90A + 144 


(A - 8 ) (A - 6 ) (A - 3) 


Standard calculations produce a basis for each eigenspace: 



1 

1 _ 


1 

1 _ 


1 - 

1 _ 

A = 8 : V] = 

1 

0 

; A = 6 : v 2 = 

-1 

2 

; A = 3: v 3 = 

- 1 

_ 1 


These three vectors form a basis for R 3 . In fact, it is easy to check that {vi, V 2 , V 3 } is an 
orthogonal basis for R 3 . Experience from Chapter 6 suggests that an orthonormal basis 
might be useful for calculations, so here are the normalized (unit) eigenvectors. 



"-1/V2" 


- 1 /V 6 


l/y/3 

Ul = 

1/V2 

, u 2 = 

-l/y/6 

, U 3 = 

l/y/3 


0 


2/V6 


1/V3 


Let 



-1/V2 

-1/V6 

l/y/3 


8 0 0 

P = 

1/>/2 

-l/y/6 

l/y/3 

, D = 

0 6 0 


0 

2/V6 

l/y/3 


0 0 3 


Then A = PDP 1 , as usual. But this time, since P is square and has orthonormal 
columns, P is an orthogonal matrix, and P~ l is simply P T . (See Section 6.2.) 

Theorem 1 explains why the eigenvectors in Example 2 are orthogonal—they cor¬ 
respond to distinct eigenvalues. 


If A is symmetric, then any two eigenvectors from different eigenspaces are 
orthogonal. 


THEOREM 1 
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PROOF Let Vi and V 2 be eigenvectors that correspond to distinct eigenvalues, say, X\ 
and A 2 . To show that Vi • V 2 = 0, compute 


np 

AiVi • V2 = (AiVi) V2 = (Avi ) 7 V2 Since Vi is an eigenvector 


T 


{\\A t )\2 = v{ (v4v 2 ) Since A T = A 


T 


v[ (A 2 v 2 ) 


Since v 2 is an eigenvector 


A 2 v[v 2 = A 2 vr v 2 


Hence (Ai — A 2 )vi • V 2 = 0. But X\ — A 2 ^ 0, so Vi • V 2 = 0. 


The special type of diagonalization in Example 2 is crucial for the theory of sym¬ 
metric matrices. An n x n matrix A is said to be orthogonally diagonalizable if there 
are an orthogonal matrix P (with P~ l = P T ) and a diagonal matrix D such that 

A = PDP t = PDP~ l (1) 


Such a diagonalization requires n linearly independent and orthonormal eigenvec¬ 
tors. When is this possible? If A is orthogonally diagonalizable as in (1), then 


A 


T 


T\T 


(PDP ) 


p tt d t p t 


PDP 


T 



Thus A is symmetric! Theorem 2 below shows that, conversely, every symmetric matrix 
is orthogonally diagonalizable. The proof is much harder and is omitted; the main idea 
for a proof will be given after Theorem 3. 


THEOREM 2 


An n x n matrix A is orthogonally diagonalizable if and only if A is a symmetric 
matrix. 


This theorem is rather amazing, because the work in Chapter 5 would suggest that 
it is usually impossible to tell when a matrix is diagonalizable. But this is not the case 
for symmetric matrices. 

The next example treats a matrix whose eigenvalues are not all distinct. 


EXAMPLE 3 


Orthogonally diagonalize the matrix A = 




characteristic equation is 

0 = -A 3 + 12A 2 - 21A - 98 = -(A - 7) 2 (A + 2) 


4 

2 

3 


, whose 


SOLUTION The usual calculations produce bases for the eigenspaces: 


A = 7: V[ = 

"1" 

0 

,v 2 = 

"—1/2" 

1 

; A = 

-2: v 3 = 

-1 

-1/2 


1 


0 



1 


Although Vi and V 2 are linearly independent, they are not orthogonal. Recall from 

V2* Vi 

Section 6.2 that the projection of V 2 onto Vi is —-Vi, and the component of V 2 

vr vi 

orthogonal to Vi is 


V2* Vi 

z 2 = v 2 - Vi = 

vr vi 




1 

0 

1 
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THEOREM 3 


Then {vi, Z 2 } is an orthogonal set in the eigenspace for A = 7. (Note that Z 2 is a linear 
combination of the eigenvectors Vi and V 2 , so z 2 is in the eigenspace. This construction 
of Z 2 is just the Gram-Schmidt process of Section 6.4.) Since the eigenspace is two- 
dimensional (with basis Vi, V 2 ), the orthogonal set {vi,z 2 } is an orthogonal basis for 
the eigenspace, by the Basis Theorem. (See Section 2.9 or 4.5.) 

Normalize Vi and Z 2 to obtain the following orthonormal basis for the eigenspace 
for A = 7: 



" 1 /V 2 " 


- 1 /V 18 

Ul = 

0 

, u 2 = 

4/VI8 


_ l/\/2_ 


1/VI8 


An orthonormal basis for the eigenspace for A = —2 is 

—2/3 “ 

-1/3 
2/3 _ 

By Theorem 1 ,113 is orthogonal to the other eigenvectors Ui and 112 . Hence {ui, 112 , 113 } 
is an orthonormal set. Let 




"l/V2 

-1/a/I8 

-2/3 


7 

0 

0 

P = [u, u 2 u 3 ] = 

0 

4/^ 

-1/3 

, D = 

0 

7 

0 


1 /V 2 

1/a/T8 

2/3 


0 

0 

-2 


Then P orthogonally diagonalizes A , and A = PDP 1 . ■ 

In Example 3, the eigenvalue 7 has multiplicity two and the eigenspace is two- 
dimensional. This fact is not accidental, as the next theorem shows. 


The Spectral Theorem 

The set of eigenvalues of a matrix A is sometimes called the spectrum of A , and the 
following description of the eigenvalues is called a spectral theorem. 


The Spectral Theorem for Symmetric Matrices 

An n x n symmetric matrix A has the following properties: 

a. A has n real eigenvalues, counting multiplicities. 

b. The dimension of the eigenspace for each eigenvalue A equals the multiplicity 
of A as a root of the characteristic equation. 

c. The eigenspaces are mutually orthogonal, in the sense that eigenvectors cor¬ 
responding to different eigenvalues are orthogonal. 

d. A is orthogonally diagonalizable. 


Part (a) follows from Exercise 24 in Section 5.5. Part (b) follows easily from part 
(d). (See Exercise 31.) Part (c) is Theorem 1. Because of (a), a proof of (d) can be given 
using Exercise 32 and the Schur factorization discussed in Supplementary Exercise 16 
in Chapter 6 . The details are omitted. 
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Spectral Decomposition 

Suppose A = PDP ~ l , where the columns of P are orthonormal eigenvectors U\,... ,u n 
of A and the corresponding eigenvalues ... ,X n are in the diagonal matrix D. Then, 
since P l = P T , 


A = PDP t = [ ui 


= [AiUi 


Using the column-row expansion of a product (Theorem 10 in Section 2.4), we can 
write 

A = Aiuiuf + A 2 u 2 u^ 4-b X n u n ul (2) 



This representation of A is called a spectral decomposition of A because it breaks 
up A into pieces determined by the spectrum (eigenvalues) of A. Each term in (2) is 
annx/i matrix of rank 1. For example, every column of AiUiuf is a multiple of Ui. 
Furthermore, each matrix u/uj is a projection matrix in the sense that for each x in 

E 77 , the vector (u 7 uj)x is the orthogonal projection of x onto the subspace spanned by 
u 7 . (See Exercise 35.) 

EXAMPLE 4 Construct a spectral decomposition of the matrix A that has the or¬ 
thogonal diagonalization 



"7 2" 


" 2 /a/5 

-1/VT 

"8 

0 " 

2 /a/5 

1/75' 

2 4 


_l/V5 

2/V5_ 

0 

3 

_ —1/a/5 

2/75. 


SOLUTION Denote the columns of P by Ui and u 2 . Then 



To verify this decomposition of A , compute 


UiU 


T 

1 





2/5 

1/5 






8uiuf 



'32/5 

16/5' 

+ 

3/5 

-6/5' 


"7 

2 " 

16/5 

8/5 

-6/5 

12/5 


2 

4 


A ■ 
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NUMERICAL NOTE - 

When A is symmetric and not too large, modern high-performance computer al¬ 
gorithms calculate eigenvalues and eigenvectors with great precision. They apply 
a sequence of similarity transformations to A involving orthogonal matrices. The 
diagonal entries of the transformed matrices converge rapidly to the eigenvalues 
of A. (See the Numerical Notes in Section 5.2.) Using orthogonal matrices gener¬ 
ally prevents numerical errors from accumulating during the process. When A is 
symmetric, the sequence of orthogonal matrices combines to form an orthogonal 
matrix whose columns are eigenvectors of A . 

A nonsymmetric matrix cannot have a full set of orthogonal eigenvectors, but 
the algorithm still produces fairly accurate eigenvalues. After that, nonorthogonal 
techniques are needed to calculate eigenvectors. 


PRACTICE PROBLEMS 

1. Show that if A is a symmetric matrix, then A 1 2 is symmetric. 

2. Show that if A is orthogonally diagonalizable, then so is A 2 . 


7.1 EXERCISES 


Determine which of the matrices in Exercises 1-6 are symmetric 


1 . 


3 5 
5 -7 


2 . 


3 -5 
-5 -3 


time, the eigenvalues in Exercises 17-22 are: (17) —4, 4, 7; (18) 
-3, -6, 9; (19) -2, 7; (20) -3,15; (21) 1,5,9; (22) 3, 5. 


13. 


3. 


2 

2 


3 

4 


4. 


0 

8 

3 


8 3 
0 -4 
2 0 


15. 


3 

1 

3 

4 


1 

3 

4 
9 


14. 


16. 


1 -5 

-5 1 

6 -2 
-2 9 













" 1 

1 

5" 



1 

-6 

4" 


"-6 

2 

0 " 


" 1 

2 

2 

1 " 


17. 

1 

5 

1 


18. 

-6 

2 

-2 

5. 

2 

-6 

2 

6 . 

2 

2 

2 

1 



_ 5 

1 

1 _ 



4 

-2 

-3 


0 

2 

-6 


2 

2 

1 

2 






















3 

-2 

4 



5 

8 

-4 

Determine which of the matrices in Exercises 7- 

-12 are orthogo- 

19. 

-2 

6 

2 


20 . 

8 

5 

-4 

nal. If orthogonal, find the inverse. 






4 

2 

3 



-4 

-4 

-1 


7. 


.6 .8 
.8 -.6 


8 . 


1 1 
1 -1 


21 


9 


4/5 3/5 

3/5 4/5 


10 


1/3 

2/3 

2/3 

2/3 

1/3 

-2/3 

2/3 

-2/3 

1/3 


11 


2/3 2/3 1/3 

0 1/3 -2/3 

5/3 -4/3 -2/3 


"4 3 

1 

1 " 



"4 

0 

3 4 

1 

1 

22 . 

0 

4 

1 1 

4 

3 

1 

0 

_ 1 1 

3 

4 _ 



0 

1 


4 

-1 

-1 " 



" 1 " 

Let A = 

-1 

4 

-1 

and v = 

1 


-1 

-1 

4 



1 


1 

0 

4 

0 


0 

1 

0 

4 


. Verify that 5 is 


an eigenvalue of A and v is an eigenvector. Then orthogo¬ 
nally diagonalize A . 


12 . 


5 .5 

5 .5 

5 -.5 
5 -.5 


-.5 -.5 
.5 .5 

-.5 .5 

.5 -.5 


24. Let A = 


i 

-1 

_i 


1 

1_ 

-1 

2 

-1 

, Vi = 

0 

1 

-1 

1_ 


i 

_i 


and v 2 = 


Orthogonally diagonalize the matrices in Exercises 13-22, giving 
an orthogonal matrix P and a diagonal matrix D . To save you 


1 

-1 

1 


. Verify that Vi and v 2 are eigenvectors of A. Then 


orthogonally diagonalize A . 
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In Exercises 25 and 26, mark each statement True or False. Justify 
each answer. 


25. a. An n x n matrix that is orthogonally diagonalizable must 

be symmetric. 

b. If A T — A and if vectors u and v satisfy Au = 3u and 
Ay = 4v, then u*v = 0. 

c. An n x n symmetric matrix has n distinct real eigenval¬ 
ues. 



For a nonzero v in E n , the matrix vv r is called a projec¬ 
tion matrix. 


26. a. There are symmetric matrices that are not orthogonally 

diagonalizable. 

b. If B — PDP t , where P T = P~ l and D is a diagonal 
matrix, then B is a symmetric matrix. 

c. An orthogonal matrix is orthogonally diagonalizable. 

d. The dimension of an eigenspace of a symmetric matrix is 
sometimes less than the multiplicity of the corresponding 
eigenvalue. 


27. Show that if A is an n x n symmetric matrix, then (Ax) *y = 
x* (Ay) for all x, y in E". 


28. Suppose A is a symmetric n x n matrix and B is any n x m 
matrix. Show that B T AB , B T B, and BB T are symmetric 
matrices. 


29. Suppose A is invertible and orthogonally diagonalizable. 
Explain why A -1 is also orthogonally diagonalizable. 

30. Suppose A and B are both orthogonally diagonalizable and 
AB = BA . Explain why AB is also orthogonally diagonaliz¬ 
able. 

31. Let A = PDP~ l , where P is orthogonal and D is diagonal, 
and let A be an eigenvalue of A of multiplicity k. Then 
A appears k times on the diagonal of D. Explain why the 
dimension of the eigenspace for A is k. 

32. Suppose A = PRP~ l , where P is orthogonal and R is upper 
triangular. Show that if A is symmetric, then R is symmetric 
and hence is actually a diagonal matrix. 

33. Construct a spectral decomposition of A from Example 2. 

34. Construct a spectral decomposition of A from Example 3. 



Let u be a unit vector in E” , and let B — uu 



a. Given any x in E”, compute Bx and show that Bx is 
the orthogonal projection of x onto u, as described in 
Section 6.2. 


b. Show that B is a symmetric matrix and B 2 = B . 

c. Show that u is an eigenvector of B . What is the corre¬ 
sponding eigenvalue? 


36. Let B be an n x n symmetric matrix such that B 2 = B . Any 
such matrix is called a projection matrix (or an orthogonal 
projection matrix). Given any y in E n , let y = By and 

z = y-y 

a. Show that z is orthogonal to y. 

b. Let W be the column space of B . Show that y is the sum 
of a vector in W and a vector in W 1 - . Why does this prove 
that By is the orthogonal projection of y onto the column 
space of B! 


[M] Orthogonally diagonalize the matrices in Exercises 37-40. 
To practice the methods of this section, do not use an eigenvector 
routine from your matrix program. Instead, use the program to find 
the eigenvalues, and, for each eigenvalue A, find an orthonormal 
basis for Nul(A — A/), as in Examples 2 and 3. 



SOLUTIONS TO PRACTICE PROBLEMS 

1. (A 2 ) r = (AA) r = A T A T , by a property of transposes. By hypothesis, A T = A. So 
(A 2 ) r = A A = A 2 , which shows that A 2 is symmetric. 

2. If A is orthogonally diagonalizable, then A is symmetric, by Theorem 2. By Practice 
Problem 1, A 2 is symmetric and hence is orthogonally diagonalizable (Theorem 2). 
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7.2 QUADRATIC FORMS 


Until now, our attention in this text has focused on linear equations, except for the sums 
of squares encountered in Chapter 6 when computing x r x. Such sums and more general 
expressions, called quadratic forms , occur frequently in applications of linear algebra 
to engineering (in design criteria and optimization) and signal processing (as output 
noise power). They also arise, for example, in physics (as potential and kinetic energy), 
differential geometry (as normal curvature of surfaces), economics (as utility functions), 
and statistics (in confidence ellipsoids). Some of the mathematical background for such 
applications flows easily from our work on symmetric matrices. 

A quadratic form on W 1 is a function Q defined on W 1 whose value at a vector x 
in W 1 can be computed by an expression of the form Q(x) = x T Ax , where A is an n x n 
symmetric matrix. The matrix A is called the matrix of the quadratic form. 

The simplest example of a nonzero quadratic form is Q(x ) = x T Ix = ||x || 2 . Exam¬ 
ples 1 and 2 show the connection between any symmetric matrix A and the quadratic 
form x^x. 


EXAMPLE 1 Let x 


X\ 

x 2 


. Compute x^x for the following matrices 


a. A 


4 0 

0 3 


SOLUTION 


a. x^x = [ x\ x 2 ] 


4 

0 


b. A 


0 

3 


X\ 

X 2 


3 

2 


2 

7 


[ *i x 2 ] 


Ax\ 

3x 2 


4xj + 3x| 


b. There are two —2 entries in A. Watch how they enter the calculations. The (1,2)- 
entry in A is in boldface type. 


x T Ax 





= xi(3xi — 2 x 2 ) + X 2 (—2xi + lx 2 ) 
= — 2x\x 2 — 2 x 2 X 1 + 7x2 

= 3x^ — 4 xiX 2 + 7x2 



3xi — 2x2 
— 2xi + 7x2 



The presence of — 4 xiX 2 in the quadratic form in Example 1(b) is due to the —2 
entries off the diagonal in the matrix A. In contrast, the quadratic form associated with 
the diagonal matrix A in Example 1(a) has no X 1 X 2 cross-product term. 


EXAMPLE 2 ForxinM 3 ,let Q(x) 

quadratic form as x^x. 


5x? + 3xf + 2x\ 


1 


X 1 X 2 + 8 x 2 X 3 . Write this 


SOLUTION The coefficients of JC y 9 ^ \ go on the diagonal of A. To make A sym¬ 
metric, the coefficient of x z - x ; - for i 7 ^ j must be split evenly between the (/, j)- and 
(j, i) -entries in A. The coefficient of x 1 X 3 is 0. It is readily checked that 


5 -1/2 0 


Xi 

-1/2 3 4 


X 2 

1 

<N 

O 

_1 


_X 3 _ 



Q(x) = x t Ax = [ X) X2 X3 ] 
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EXAMPLE 3 Let Q (x) — x j ^>x\x 2 5 x^ . Compute the value of (^(x) for x — 


"-3" 


2 " 

, and 

r 

a 

1 


-2 

-3 


SOLUTION 


G(-3,1) = 

= (-3) 2 

— 8(—3)(1) 

-5(1 ) 2 = 

= 28 


Q( 2, -2) = 

= (2) 2 - 

8 (2) (-2) - 

5(—2 ) 2 = 

= 16 


0(1. -3) = 

= (l ) 2 - 

8(1)(—3) - 

5(—3) 2 = 

= -20 

■ 


In some cases, quadratic forms are easier to use when they have no cross-product 
terms —that is, when the matrix of the quadratic form is a diagonal matrix. Fortunately, 
the cross-product term can be eliminated by making a suitable change of variable. 


Change of Variable in a Quadratic Form 


If x represents a variable vector in W r , then a change of variable is an equation of the 
form 

or equivalently, y = P~ l 


x 


P y, 


X 


( 1 ) 


where P is an invertible matrix and y is a new variable vector in M /? . Here y is the 
coordinate vector of x relative to the basis of W 1 determined by the columns of P . (See 
Section 4.4.) 

If the change of variable (1) is made in a quadratic form x^x, then 


x t Ax = (Py) J A(Py) = y J P J APy = y J (P J AP)y 


T 


Tr>T 


T/nT 


( 2 ) 


and the new matrix of the quadratic form is P T AP . Since A is symmetric, Theorem 2 
guarantees that there is an orthogonal matrix P such that P T AP is a diagonal matrix D , 
and the quadratic form in (2) becomes y T Dy. This is the strategy of the next example. 


EXAM PLE 4 Make a change of variable that transforms the quadratic form in Ex¬ 
ample 3 into a quadratic form with no cross-product term. 


SOLUTION The matrix of the quadratic form in Example 3 is 



The first step is to orthogonally diagonalize A . Its eigenvalues turn out to be A = 3 and 
A = —7. Associated unit eigenvectors are 




These vectors are automatically orthogonal (because they correspond to distinct eigen¬ 
values) and so provide an orthonormal basis for M 2 . Let 



1/V5 

2/V5 



Then A = PDP~ l and D = P~ X AP = P T AP , as pointed out earlier. A suitable change 
of variable is 


X\ 

x 2 




y i 

T2 


X = Py, 


where x 
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THEOREM 4 


Then 

x\ — 8x1X2 — 5x2 = x t Ax = ( Py) T A(Py ) 

= y T P T APy = y T Dy 

= 3 y\ ~ ly\ ■ 


To illustrate the meaning of the equality of quadratic forms in Example 4, we can 
compute Q(x ) for x = (2, —2) using the new quadratic form. First, since x = P y, 

y = p-'x = P T \ 


2/V5 

—1/a/5 

2" 


6/75 

1/a/5 

2/a/5 

-2 


-2/75 


Hence 

3 y\ ~ 7 y\ = 3(6/ V5) 2 - 7(—2/V5) 2 = 3(36/5) - 7(4/5) 

= 80/5 = 16 


This is the value of Q(x ) in Example 3 when x = (2, —2). See Figure 1. 




FIGURE 1 Change of variable in x^x. 

Example 4 illustrates the following theorem. The proof of the theorem was essen¬ 
tially given before Example 4. 


The Principal Axes Theorem 

Let A be an n x n symmetric matrix. Then there is an orthogonal change of 
variable, x = Py, that transforms the quadratic form x^4x into a quadratic form 
y T Dy with no cross-product term. 


The columns of P in the theorem are called the principal axes of the quadratic 
form x^4x. The vector y is the coordinate vector of x relative to the orthonormal basis 
of given by these principal axes. 

A Geometric View of Principal Axes 

Suppose Q(x) = x^x, where A is an invertible 2x2 symmetric matrix, and let c be a 
constant. It can be shown that the set of all x in M 2 that satisfy 

x^x = c (3) 
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either corresponds to an ellipse (or circle), a hyperbola, two intersecting lines, or a single 
point, or contains no points at all. If A is a diagonal matrix, the graph is in standard 
position , such as in Figure 2. If A is not a diagonal matrix, the graph of equation (3) is 





i 


XT XT 

— + — = 1, a>b> 0 
a 2 b 2 

ellipse 


FIGURE 2 An ellipse and a hyperbola in standard position. 


XT XT 

-A-^- = l, a>b> 0 
a 2 b 2 

hyperbola 


rotated out of standard position, as in Figure 3. Finding the principal axes (determined 
by the eigenvectors of A) amounts to finding a new coordinate system with respect to 
which the graph is in standard position. 






i 


(a) 5x 2 - 4xjX 2 + 5x 2 = 48 (b) x 2 - 8xjX 2 - 5x 2 = 16 

FIGURE 3 An ellipse and a hyperbola not in standard position. 

The hyperbola in Figure 3(b) is the graph of the equation x^x =16, where A is 
the matrix in Example 4. The positive jq-axis in Figure 3(b) is in the direction of the 
first column of the matrix P in Example 4, and the positive y 2 -axis is in the direction 
of the second column of P . 


EXAM PLE 5 The ellipse in Figure 3(a) is the graph of the equation 5x\ — Ax\X 2 + 
5x 2 = 48. Find a change of variable that removes the cross-product term from the 
equation. 


SOLUTION The matrix of the quadratic form is A 


5 

2 


2 

5 


A turn out to be 3 and 7, with corresponding unit eigenvectors 


. The eigenvalues of 
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Let P = [ Ui u 2 1 


1 /V2 -1/V2 
1 /V2 1/V2 


. Then P orthogonally diagonalizes A , so the 


change of variable x = Py produces the quadratic form y T Dy 
axes for this change of variable are shown in Figure 3(a). 


3 y\ + ly 9 . The new 


Classifying Quadratic Forms 

When A is an n x n matrix, the quadratic form Q(x) = x T Ax is a real-valued function 
with domain W 1 . Figure 4 displays the graphs of four quadratic forms with domain M 2 . 
For each point x = (x\ , X 2 ) in the domain of a quadratic form Q , the graph displays the 
point (xi, X 2 , z ) where z = Q(x). Notice that except at x = 0, the values of Q(x) are 
all positive in Figure 4(a) and all negative in Figure 4(d). The horizontal cross-sections 
of the graphs are ellipses in Figures 4(a) and 4(d) and hyperbolas in Figure 4(c). 



FIGURE 4 Graphs of quadratic forms. 


The simple 2x2 examples in Figure 4 illustrate the following definitions. 

DEFINITION A quadratic form Q is: 

a. positive definite if Q(x) > 0 for all x ^ 0, 

b. negative definite if Q(x) <0 for all x ^ 0, 

c. indefinite if Q(x) assumes both positive and negative values. 

Also, Q is said to be positive semidefinite if Q(x) >0 for all x, and to be negative 
semidefinite if Q(x ) <0 for all x. The quadratic forms in parts (a) and (b) of Figure 4 
are both positive semidefinite, but the form in (a) is better described as positive definite. 
Theorem 5 characterizes some quadratic forms in terms of eigenvalues. 

THEOREM 5 Quadratic Forms and Eigenvalues 

Let A be an n x n symmetric matrix. Then a quadratic form x^x is: 

a. positive definite if and only if the eigenvalues of A are all positive, 

b. negative definite if and only if the eigenvalues of A are all negative, or 

c. indefinite if and only if A has both positive and negative eigenvalues. 
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Indefinite 


PROOF By the Principal Axes Theorem, there exists an orthogonal change of variable 
x = P y such that 

Q(s) = x t Ax = y T Dy = Ai y\ + A 2 y\ 4-1- Kyi (4) 

where Ai,..., A„ are the eigenvalues of A. Since P is invertible, there is a one-to- 
one correspondence between all nonzero x and all nonzero y. Thus the values of Q(x) 
for x 7 ^ 0 coincide with the values of the expression on the right side of (4), which 
is obviously controlled by the signs of the eigenvalues Ai,..., A„, in the three ways 
described in the theorem. ■ 


EXAMPLE 6 Is Q(x ) = + 2x\ + x\ + \x\X 2 + 4 ^ 2 X 3 positive definite? 


SOLUTION Because of all the plus signs, this form “looks” positive definite. But the 
matrix of the form is 



0 

2 

1 


and the eigenvalues of A turn out to be 5, 2, and —1. So Q is an indefinite quadratic 
form, not positive definite. ■ 


The classification of a quadratic form is often carried over to the matrix of the form. 
Thus a positive definite matrix A is a symmetric matrix for which the quadratic form 
x^x is positive definite. Other terms, such as positive semidefinite matrix, are defined 
analogously. 



NUMERICAL NOTE - 

A fast way to determine whether a symmetric matrix A is positive definite is 
to attempt to factor A in the form A = R T R , where R is upper triangular with 
positive diagonal entries. (A slightly modified algorithm for an LU factorization 
is one approach.) Such a Cholesky factorization is possible if and only if A is 
positive definite. See Supplementary Exercise 7 at the end of Chapter 7. 


PRACTICE PROBLEM 


Describe a positive semidefinite matrix A in terms of its eigenvalues. 

WEB 


7.2 EXERCISES 


1. Compute the quadratic form x^x, when A = 


and 



X\ 

b. x = 

" 6 " 


" 1 " 

a. x = 

1 

C. X = 

_ 3 _ 






2. Compute the quadratic form x^x, for A = 


3 

2 

0 


5 1/3' 


Xl 


~-2~ 


1/V2 

_ 1/3 1 

a. x = 

X 2 

b. x = 

-1 

c. x = 

1/V2 



_ X 3_ 


5 


1/V2 


3. Find the matrix of the quadratic form. Assume x is in IR 2 . 

2 0 
2 1 
1 0 

a. 5x\ + 16xix 2 — 5x\ b. 2xix 2 


a. 3 x 2 — 4 xix 2 + 5 xf 


b. 3 x 2 + 2 xix 2 


4. Find the matrix of the quadratic form. Assume x is in M 2 . 


and 
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5. Find the matrix of the quadratic form. Assume x is in 




3xf + 2x\ — 5xl — 6xix 2 + 8xix 3 — 4x 2 x 3 


b. 6x\x 2 + 4xix 3 — 10x 2 x 3 


6 . Find the matrix of the quadratic form. Assume x is in IR 3 . 

a. 3xj — 2x\ + 5x\ + 4xix 2 — 6xix 3 

b. \x\ — 2x\X 2 + 4x 2 x 3 


7. Make a change of variable, x = P y, that transforms the 
quadratic form x\ + 10xix 2 + x\ into a quadratic form with 
no cross-product term. Give P and the new quadratic form. 


8. Let A be the matrix of the quadratic form 


9x\ + lx\ + llxf — 8xix 2 + 8xix 3 


It can be shown that the eigenvalues of A are 3, 9, and 15. 
Find an orthogonal matrix P such that the change of variable 
x = Py transforms xAx into a quadratic form with no cross- 
product term. Give P and the new quadratic form. 


Classify the quadratic forms in Exercises 9-18. Then make a 
change of variable, x = P y, that transforms the quadratic form 
into one with no cross-product term. Write the new quadratic form. 
Construct P using the methods of Section 7.1. 


9. 

11 . 

13. 

15. 

16. 

17. 

18. 


Ax\ — Ax\x 2 + 4x| 10. 2x\ + 5x\x 2 — 6xf 

2x\ — 4xix 2 — x\ 12. — x\ — 2x\X 2 — x\ 

x\ — bx\x 2 + 9x\ 14. 3x\ + 4xix 2 

[M] —3x\ — 7x 2 2 — lOx 2 — lOx 2 + 4xix 2 + 4xix 3 + 

4xix 4 + 6 x 3 x 4 

[M] Ax\ + \x\ + \x\ + 4 x\ + 8xix 2 + 8x 3 x 4 — 6xix 4 + 
6x 2 x 3 

[M] Wx\ + llx\ + llx 2 + l\x\ + 16xix 2 — 12xix 4 + 
12x 2 x 3 + 16x 3 x 4 


[M] 2x 2 + 2x 2 — 6xix 2 — 6xix 3 — 6xix 4 — 6x 2 x 3 — 
6x 2 x 4 — 2x 3 x 4 


19. What is the largest possible value of the quadratic 
form 5x 2 + 8x 2 if x = (xi,x 2 ) and x T x = 1, that is, if 
x\+x\ = 1? (Try some examples of x.) 

20. What is the largest value of the quadratic form 5x 2 — 3xf if 
x r x = 1? 


d. A positive definite quadratic form Q satisfies Q(x) > 0 
for all x in IR n . 

e. If the eigenvalues of a symmetric matrix A are all posi¬ 
tive, then the quadratic form x T Ax is positive definite. 

f. A Cholesky factorization of a symmetric matrix A has 
the form A = R T R , for an upper triangular matrix R with 
positive diagonal entries. 

22. a. The expression ||x|| 2 is not a quadratic form. 

b. If A is symmetric and P is an orthogonal matrix, then 
the change of variable x = Py transforms x^Ax into a 
quadratic form with no cross-product term. 

c. If A is a 2 x 2 symmetric matrix, then the set of x such 
that xAx = c (for a constant c ) corresponds to either a 
circle, an ellipse, or a hyperbola. 

d. An indefinite quadratic form is neither positive semidef- 
inite nor negative semidefinite. 

e. If A is symmetric and the quadratic form xAx has only 
negative values for x ^ 0, then the eigenvalues of A are 
all positive. 


Exercises 23 and 24 show how to classify a quadratic form 

a b 

b d 


Q(x ) = xAx,whenA = 


ing the eigenvalues of A. 


and det A ^ 0, without find- 


23. If A] and A 2 are the eigenvalues of A, then the characteristic 
polynomial of A can be written in two ways: det(A — XI) 
and (A — AQ(A — A 2 ). Use this fact to show that X\ + A 2 = 
a + d (the diagonal entries of A) and Aj A 2 = det A. 


24. Verify the following statements. 

a. Q is positive definite if det A > 0 and a > 0. 

b. Q is negative definite if det A >0 and a < 0. 


25. 


c. Q is indefinite if det A < 0. 

Show that if B is m x n , then B T B is positive semidefinite; 
and if B is n x n and invertible, then B T B is positive definite. 


26. Show that if an n x n matrix A is positive definite, then there 
exists a positive definite matrix B such that A = B T B . [Hint: 
Write A = PDP T , with P T = P ~ l . Produce a diagonal ma¬ 
trix C such that D = C T C , and let B = PCP T . Show that 
B works.] 


In Exercises 21 and 22, matrices are n x n and vectors are in M". 
Mark each statement True or False. Justify each answer. 


21. a. The matrix of a quadratic form is a symmetric matrix. 

b. A quadratic form has no cross-product terms if and only 
if the matrix of the quadratic form is a diagonal matrix. 

c. The principal axes of a quadratic form x Ax are eigenvec¬ 
tors of A. 


27. Let A and B be symmetric n x n matrices whose eigenvalues 
are all positive. Show that the eigenvalues of A + B are all 
positive. [Hint: Consider quadratic forms.] 

28. Let A be an n x n invertible symmetric matrix. Show that 
if the quadratic form xAx is positive definite, then so is the 
quadratic form xA -1 x. [Hint: Consider eigenvalues.] 



Mastering: Diagonalization and Quadratic Forms 7-7 
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SOLUTION TO PRACTICE PROBLEM 

Make an orthogonal change of variable x = Py, and write 

x^x = y T Dy = \ { y\ + x 2 y 2 - v Ky 2 „ 

as in equation (4). If an eigenvalue—say, A; — were negative, then x^x would be neg¬ 
ative for the x corresponding to y = e* (the zth column of I n ). So the eigenvalues 
of a positive semidefinite quadratic form must all be nonnegative. Conversely, if the 
eigenvalues are nonnegative, the expansion above shows that x^x must be positive 
semidefinite. 


7.3 CONSTRAINED OPTIMIZATION 


Engineers, economists, scientists, and mathematicians often need to find the maximum 
or minimum value of a quadratic form Q(x) for x in some specified set. Typically, the 
problem can be arranged so that x varies over the set of unit vectors. This constrained 
optimization problem has an interesting and elegant solution. Example 6 below and the 
discussion in Section 7.5 will illustrate how such problems arise in practice. 

The requirement that a vector x in M 7? be a unit vector can be stated in several 
equivalent ways: 


x 


1 , 


X 


1 , 


T 

X X 


1 


and 

x\ ~\~ %2 H” ■ * * H” = 1 ( 1 ) 

The expanded version (1) of x T x = 1 is commonly used in applications. 

When a quadratic form Q has no cross-product terms, it is easy to find the maximum 
and minimum of Q(x ) for x r x = 1. 


EXAMPLE 1 Find the maximum and minimum values of Q(x ) = 9x\ + \x\ + 3x| 
subject to the constraint x T x = 1. 

SOLUTION Since A and x\ are nonnegative, note that 

\x\ < 9x\ and 3 X 3 < 9 X 3 

and hence 

Q(x ) = 9x\ + 4 X 9 + 3 X 3 

< 9x^ + 9x| + 9 X 3 

= 9(X| + x\ + X 3 ) 

= 9 

whenever x\ + x\ + x\ — 1. So the maximum value of Q(x ) cannot exceed 9 when 
x is a unit vector. Furthermore, Q(x ) = 9 when x = (1, 0,0). Thus 9 is the maximum 
value of Q (x) for x T x = 1 . 

To find the minimum value of Q (x), observe that 

9x\ > 3xj, 4 x 2 — ^ x 2 

and hence 

Q(x) > 3x\ + 3 x 2 + 3 X 3 = 3{x\ + X 2 + X 3 ) = 3 

whenever x\ + x\ + xf = 1. Also, Q(x) = 3 when x\ = 0, X 2 = 0, and X 3 = 1. So 3 
is the minimum value of Q (x) when x T x = 1 . ■ 
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It is easy to see in Example 1 that the matrix of the quadratic form Q has eigen¬ 
values 9, 4, and 3 and that the greatest and least eigenvalues equal, respectively, the 
(constrained) maximum and minimum of Q (x). The same holds true for any quadratic 
form, as we shall see. 


EXAMPLE 2 Let A 


3 

0 


0 

7 


, and let Q(x) = x T Ax for x in M 2 . Figure 1 dis¬ 


plays the graph of Q . Figure 2 shows only the portion of the graph inside a cylinder; 
the intersection of the cylinder with the surface is the set of points (xi,X 2 ,z) such 
thatz = Q(x\,X 2 ) and x i + x i = i . The “heights” of these points are the constrained 
values of Q(x). Geometrically, the constrained optimization problem is to locate the 
highest and lowest points on the intersection curve. 

The two highest points on the curve are 7 units above the x 1 X 2 -plane, occurring 
where x\ = 0 and X 2 = =bl. These points correspond to the eigenvalue 7 of A and 
the eigenvectors x = (0, 1) and —x = (0, —1). Similarly, the two lowest points on the 
curve are 3 units above the x 1 X 2 -plane. They correspond to the eigenvalue 3 and the 
eigenvectors (1,0) and (—1,0). ■ 




FIGURE 1 z = 3xf + 7x\. 


FIGURE 2 The intersection of 
z = 3xJ + 7x\ and the cylinder 
x\ J rx\—\. 


Every point on the intersection curve in Figure 2 has a z-coordinate between 3 and 
7, and for any number t between 3 and 7, there is a unit vector x such that Q(x) = t. 


In other words, the set of all possible values of x^x, for 
3 < t < 7. 


x 


1, is the closed interval 


It can be shown that for any symmetric matrix A , the set of all possible values of 
x^x, for ||x|| = 1, is a closed interval on the real axis. (See Exercise 13.) Denote the 
left and right endpoints of this interval by m and M , respectively. That is, let 




max {x v4x 





Exercise 12 asks you to prove that if A is an eigenvalue of A , then m < A < M . The 
next theorem says that m and M are themselves eigenvalues of A, just as in Example 2. 1 


1 The use of minimum and maximum in (2), and least and greatest in the theorem, refers to the natural 

ordering of the real numbers, not to magnitudes. 
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THEOREM 6 


Let A be a symmetric matrix, and define m and M as in (2). Then M is the greatest 
eigenvalue A i of A and m is the least eigenvalue of A. The value of x^x is M 
when x is a unit eigenvector ui corresponding to M . The value of x^x is m when 
x is a unit eigenvector corresponding to m. 


PROOF Orthogonally diagonalize A as PDP 1 . We know that 


Also, 


X T AX : 

= y T Dy 

when x = Py 

X = 

ll^yll = 

■ y for all y 


( 3 ) 


because P T P = I and |||| 2 = ( Py) T (Py ) = y T P T Py 


TnT 


T 

y y 


y || 2 . In particular, 


llyl 


1 if and only if ||x|| = 1. Thus x^x and y T Dy assume the same set of values as 


x and y range over the set of all unit vectors. 

To simplify notation, suppose that A is a 3 x 3 matrix with eigenvalues a > b > c. 
Arrange the (eigenvector) columns of P so that P = [ Ui u 2 113 ] and 


D 


a 

0 

0 


0 

b 

0 


0 

0 

c 


Given any unit vector y in M 3 with coordinates yi, y 2 , y 3 , observe that 


ay 1 


ay 1 

2 


by 2 < ay 2 

cyl < ayj 


and obtain these inequalities: 


y T Dy = ay{ + by | + cy\ 
< ay\ + ay\ + ay\ 
= a(y\ + yl + y\) 


a 



a 


Thus M < a, by definition of M. However, y T Dy = a when y = ei = (1, 0,0), so in 
fact M = a. By (3), the x that corresponds to y = ei is the eigenvector ui of A, because 


x 


Pe 


1 


[ u i 


u 2 


U 3 ] 


1 

0 

0 


ui 


Thus M = a = De 1 = ufAui, which proves the statement about M. A similar ar¬ 
gument shows that m is the least eigenvalue, c, and this value of x^x is attained when 

^e 3 = 


x 


u 3 . 


EXAMPLE 3 Let A 


3 2 1 


2 3 

1 1 


1 

4 


form x^x subject to the constraint x T x 
mum value is attained. 


. Find the maximum value of the quadratic 


1, and find a unit vector at which this maxi- 


SOLUTION By Theorem 6, the desired maximum value is the greatest eigenvalue of 
A. The characteristic equation turns out to be 

0 = -A 3 + 10A 2 - 27A + 18 = -(A - 6)(A - 3)(A - 1) 

The greatest eigenvalue is 6. 
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THEOREM 7 


The constrained maximum of x^x is attained when x is a unit eigenvector for 


A = 6. Solve (A — 6/)x = 0 and find an eigenvector 


~1" 


1 /V 3 

1 

. Set ui = 

1/a/3 

1 


1/a/3 


In Theorem 7 and in later applications, the values of x^x are computed with addi¬ 
tional constraints on the unit vector x. 


Let A, Ai, and Ui be as in Theorem 6. Then the maximum value of x^x subject 
to the constraints 

x r x = 1, x r m = 0 

is the second greatest eigenvalue, A 2 , and this maximum is attained when x is an 
eigenvector 112 corresponding to A 2 . 


Theorem 7 can be proved by an argument similar to the one above in which the 
theorem is reduced to the case where the matrix of the quadratic form is diagonal. The 
next example gives an idea of the proof for the case of a diagonal matrix. 

EXAMPLE 4 Find the maximum value of 9x\ + 4x? + 3x\ subject to the con¬ 
straints x T x = 1 and x r ui = 0, where Ui = (1,0, 0). Note that ui is a unit eigenvector 
corresponding to the greatest eigenvalue A = 9 of the matrix of the quadratic form. 

SOLUTION If the coordinates of x are x\ , X 2 , X3, then the constraint x r ui = 0 means 
simply that x\ = 0. For such a unit vector, x\ + x\ = 1, and 

9x\ + \x\ + 3x\ — \x\ + 3 X 3 

< 4x\ + 4 x 3 

= 4{x\ + x 3 2 ) 

= 4 

Thus the constrained maximum of the quadratic form does not exceed 4. And this value 
is attained for x = (0,1,0), which is an eigenvector for the second greatest eigenvalue 
of the matrix of the quadratic form. ■ 


EXAMPLE 5 Let A be the matrix in Example 3 and let Ui be a unit eigenvector 
corresponding to the greatest eigenvalue of A. Find the maximum value of x T Ax subject 
to the conditions 


T 

XX 


1 , 


T 

X Ui 


0 


(4) 


SOLUTION From Example 3, the second greatest eigenvalue of A is A = 3. Solve 
(A — 3/)x = 0 to find an eigenvector, and normalize it to obtain 



1/V6 

i/Ve 

-2/V6 


The vector 112 is automatically orthogonal to Ui because the vectors correspond to dif¬ 
ferent eigenvalues. Thus the maximum of x^x subject to the constraints in (4) is 3, 
attained when x = u?. ■ 


The next theorem generalizes Theorem 7 and, together with Theorem 6, gives a 
useful characterization of all the eigenvalues of A. The proof is omitted. 
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THEOREM 8 Let A be a symmetric n x n matrix with an orthogonal diagonalization 

A = PDP~ l , where the entries on the diagonal of D are arranged so that 
X\ > A 2 ^ ^ A„ and where the columns of P are corresponding unit eigen¬ 

vectors ui ,..., u„. Then for A = 2,..., n , the maximum value of x T Ax subject to 
the constraints 

x T x = 1, x r ui = 0, ..., x r u£_i = 0 

is the eigenvalue A^, and this maximum is attained at x = . 


Theorem 8 will be helpful in Sections 7.4 and 7.5. The following application re¬ 
quires only Theorem 6. 


EXAMPLE 6 During the next year, a county government is planning to repair x 
hundred miles of public roads and bridges and to improve y hundred acres of parks and 
recreation areas. The county must decide how to allocate its resources (funds, equip¬ 
ment, labor, etc.) between these two projects. If it is more cost effective to work si¬ 
multaneously on both projects rather than on only one, then x and y might satisfy a 
constraint such as 

Ax 1 + 9 y 2 < 36 



See Figure 3. Each point (x, y) in the shaded feasible set represents a possible public 
works schedule for the year. The points on the constraint curve, 4x 2 + 9 y 2 = 36, use 
the maximum amounts of resources available. 


Parks and 
recreation 

2- 

4x 2 + 9y 2 = 36 

Feasible 
set \ 


3 


Road and bridge repair 

FIGURE 3 

Public works schedules. 


In choosing its public works schedule, the county wants to consider the opinions of 
the county residents. To measure the value, or utility , that the residents would assign to 
the various work schedules (x, y), economists sometimes use a function such as 

q(x,y) = xy 

The set of points (x,y) at which q(x,y) is a constant is called an indifference curve. 
Three such curves are shown in Figure 4. Points along an indifference curve correspond 
to alternatives that county residents as a group would find equally valuable. 2 Find the 
public works schedule that maximizes the utility function q . 

SOLUTION The constraint equation 4x 2 + 9 y 2 = 36 does not describe a set of unit 
vectors, but a change of variable can fix that problem. Rewrite the constraint in the form 



1 


2 Indifference curves are discussed in Michael D. Intriligator, Ronald G. Bodkin, and Cheng Hsiao, 
Econometric Models, Techniques, and Applications (Upper Saddle River, NJ: Prentice-Hall, 1996). 
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y 


Parks and 

U\ Ax 2 + 9 y 2 = 36 

recreation 

Vys. (indifference curves) 

1.4- 

- q(x, y) = 4 


q(x, y) = 3 


2.1 q(x,y) = 2 


Road and bridge repair 

FIGURE 4 The optimum public works schedule 
is (2.1,1.4). 


and define 



that is, 


Then the constraint equation becomes 



x\ + x\ = 1 


and y = 2 x 2 


and the utility function becomes g( 3 xi, 2 x 2 ) = ( 3 xi)( 2 x 2 ) = 

Then the problem is to maximize Q(x) = 6 x 1 X 2 subject to x T x 
x t Ax, where 



6 x 1 X 2 . Let x 


Xi 

x 2 


1. Note that Q (x) 


The eigenvalues of A are d= 3, with eigenvectors 


1/V2 

for A = 3 and 

-1/V2 

1 /V 2 


1/V2 


for 


A 


3. Thus the maximum value of Q(x) = q(x 1 , X 2 ) is 3, attained when xi = 1 /V 2 


and x 2 = 1 /V 2 . 

In terms of the original variables, the optimum public works schedule is x = 3xi = 
3/^2.1 hundred miles of roads and bridges and y = 2x2 — V2 ^1.4 hundred 
acres of parks and recreational areas. The optimum public works schedule is the point 
where the constraint curve and the indifference curve q(x, y) = 3 just meet. Points 
(x, y) with a higher utility lie on indifference curves that do not touch the constraint 
curve. See Figure 4. ■ 


PRACTICE PROBLEMS 

1. Let Q(x) = 3xj + 3x\ + 2 x 1 X 2 . Find a change of variable that transforms Q into a 
quadratic form with no cross-product term, and give the new quadratic form. 

2. With Q as in Problem 1, find the maximum value of Q(x ) subject to the constraint 
x T x = 1 , and find a unit vector at which the maximum is attained. 


7.3 EXERCISES 

In Exercises 1 and 2, find the change of variable x = Py that 
transforms the quadratic form xf4x into y T Dy as shown. 

1. 5x[ + 6x|-1-7x 3+4xix 2 —4 x 2 x 3 = 9y\ + 6y| + 3yf 

2. 3x\ + 3x| + 5x 3 +6xiX 2 + 2xiX 3 + 2x 2 x 3 = 7y^ + 4y| 

Hint: x and y must have the same number of coordinates, so the 
quadratic form shown here must have a coefficient of zero for y \. 


In Exercises 3-6, find (a) the maximum value of Q(x) subject to 
the constraint x T x = 1, (b) a unit vector u where this maximum is 
attained, and (c) the maximum of Q(x ) subject to the constraints 
x r x = 1 and x T u = 0. 

3. Q{x) = 5x\ + 6 x 2 + lx] + 4xix 2 — 4 x 2 x 3 
(See Exercise 1.) 
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4- 2(x) = 3x\ + 3x\ + + 6 x 1 X 2 + 2xix 3 + 2x 2 x 3 (See Exer¬ 

cise 2 .) 

5. Q(x) = x\ + x^ — 10xix 2 

6 . Q{x) = 3x^ + 9x| + 8 xix 2 

7. Let 0(x) = — 2x^ — x\ + 4xix 2 + 4x 2 x 3 . Find a unit vector 
x in R 3 at which 0(x) is maximized, subject to x r x = 1 . 
[Hint: The eigenvalues of the matrix of the quadratic form 
Q are 2 , —1, and —4.] 

8. Let <2(x) = lx\ + x\ + lx\ — 8 xix 2 — 4xix 3 — 8 x 2 x 3 . 
Find a unit vector x in R 3 at which Q(x) is maximized, 
subject to x T x = 1. [Hint: The eigenvalues of the matrix of 
the quadratic form Q are 9 and —3.] 

9. Find the maximum value of Q(x ) = lx\ + 3x^ — 2xix 2 , 
subject to the constraint xj + x\ = 1 . (Do not go on to find 
a vector where the maximum is attained.) 

10. Find the maximum value of Q(x) = —3x\ + 5x\ — 2xix 2 , 
subject to the constraint x\ + x\ = 1 . (Do not go on to find 
a vector where the maximum is attained.) 

11. Suppose x is a unit eigenvector of a matrix A corresponding 
to an eigenvalue 3. What is the value of x^4x? 


12. Let A be any eigenvalue of a symmetric matrix A. Justify 
the statement made in this section that m < A < M , where 
m and M are defined as in (2). [Hint: Find an x such that 

A = x^x.] 

13. Let A be an n x n symmetric matrix, let M and m denote the 
maximum and minimum values of the quadratic form x^x, 
where x T x = 1, and denote corresponding unit eigenvectors 
by Ui and u„. The following calculations show that given any 
number t between M and m , there is a unit vector x such that 
t = x T Ax. Verify that t = (1 — a )m + aM for some number 
a between 0 and 1. Then let x = Vl — otu n + ^/au\ , and 
show that x T x = 1 and x^x = t . 

[M] In Exercises 14-17, follow the instructions given for Exer¬ 
cises 3-6. 

14. 3xix 2 + 5xix 3 + 7xix 4 + 7x 2 x 3 + 5x 2 x 4 + 3x 3 x 4 

15. 4x 3 — 6 xix 2 — 10xix 3 —10xix 4 — 6 x 2 x 3 — 6 x 2 x 4 —2x 3 x 4 

16. — 6 x^ — 1 Oxf — 0 X 3 — 13xJ — 4xi x 2 — 4xi x 3 — 4xi x 4 + 6 x 3 x 4 

17. xix 2 + 3xix 3 + 30xix 4 + 30x 2 x 3 + 3x 2 x 4 + x 3 x 4 



The maximum value of Q (x) 
subject to x T x = 1 is 4. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. The matrix of the quadratic form is A 


3 

1 


1 

3 


. It is easy to find the eigenvalues, 


4 and 2, and corresponding unit eigenvectors, 


1/V2 


-1/V2 

1/V2 

dllU 

1/V2 


. So the 


desired change of variable is x = P y, where P 


1 /V 2 - 1 /V 2 
1 /V 2 1 /V 2 


. (A common 


error here is to forget to normalize the eigenvectors.) The new quadratic form is 
y T Dy = 4 y\ + 2 y\. 

2. The maximum of Q(x), for a unit vector x, is 4 and the maximum is attained at 

. This vector 


" 1/+2" 

I - A * . • 

T 

_1/V2. 

. [A common incorrect answer is 

0 


maximizes the quadratic form y T Dy instead of Q(x).] 


7.4 THE SINGULAR VALUE DECOMPOSITION 


The diagonalization theorems in Sections 5.3 and 7.1 play a part in many interesting ap¬ 
plications. Unfortunately, as we know, not all matrices can be factored as A = PDP~ l 
with D diagonal. However, a factorization A = QDP _1 is possible for any m x n 
matrix A \ A special factorization of this type, called the singular value decomposition , 
is one of the most useful matrix factorizations in applied linear algebra. 

The singular value decomposition is based on the following property of the ordinary 
diagonalization that can be imitated for rectangular matrices: The absolute values of the 
eigenvalues of a symmetric matrix A measure the amounts that A stretches or shrinks 
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certain vectors (the eigenvectors). If Ax 


\\Ax 


,). If Ax = 

= Ax and | x 

= 1 , then 


II = l|Ax|| 

= \M Ibll = 

= |A| 

( 1 ) 


If Xi is the eigenvalue with the greatest magnitude, then a corresponding unit eigenvec¬ 
tor Vi identifies a direction in which the stretching effect of A is greatest. That is, the 
length of Ax is maximized when x = Vi, and ||v4vi || = |Ai |, by (1). This description of 
Vi and | A 11 has an analogue for rectangular matrices that will lead to the singular value 
decomposition. 


EXAMPLE 1 If A = 



14 

-2 


, then the linear transformation x i-> Ax maps 


the unit sphere {x : ||x|| = 1} in M 3 onto an ellipse in M 2 , shown in Figure 1 . Find a unit 
vector x at which the length ||v4x|| is maximized, and compute this maximum length. 



Multiplication 
by A 





FIGURE 1 A transformation from R 3 to R 2 . 


2 

SOLUTION The quantity 11 Ax \| is maximized at the same x that maximizes 11 Ax \|, and 
||v4x|| 2 is easier to study. Observe that 

||v4x|| 2 = (v4x) r (v4x) = x t A t Ax = x t (A t A)x 

Also, A t A is a symmetric matrix, since ( A T A) T = A T A TT = A T A. So the problem now 
is to maximize the quadratic form x T (A r A)x subject to the constraint ||x|| = 1. By 
Theorem 6 in Section 7.3, the maximum value is the greatest eigenvalue X\ of A r A. 
Also, the maximum value is attained at a unit eigenvector of A T A corresponding to X \. 
For the matrix A in this example, 



80 

100 

40 

100 

170 

140 

40 

140 

200 


The eigenvalues of A T A are X\ = 360, A 2 = 90, and A 3 = 0. Corresponding unit eigen¬ 
vectors are, respectively, 



" 1 / 3 “ 


- 2/3 


2/3" 

Vl = 

2/3 

2/3 

, v 2 = 

- 1/3 

2/3 

, V 3 = 

-2/3 

1/3 


2 

The maximum value of || Ax|| is 360, attained when x is the unit vector Vi. The vector 
A\\ is a point on the ellipse in Figure 1 farthest from the origin, namely, 



= 1, the maximum value of ||v4x|| is ||v4vi 


= V360 = 6710. ■ 


For 


x 
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Example 1 suggests that the effect of A on the unit sphere in M 3 is related to the 
quadratic form x T (A T A)x. In fact, the entire geometric behavior of the transformation 
x i-^ v4x is captured by this quadratic form, as we shall see. 


The Singular Values of an m x n Matrix 


Let A be an m x n matrix. Then A r A is symmetric and can be orthogonally diagonalized. 
Let {vi,..., \ n } be an orthonormal basis for R /? consisting of eigenvectors of A r A, and 
let Ai,..., X n be the associated eigenvalues of A r A. Then, for 1 <i < n , 



{A\i) T A\ i 




Since v* is an eigenvector of A T A 



Since V; is a unit vector 



So the eigenvalues of A T A are all nonnegative. By renumbering, if necessary, we may 
assume that the eigenvalues are arranged so that 


Ai > A 2 > • • • > A n 0 

The singular values of A are the square roots of the eigenvalues of A T A, denoted by 
G\ ,..., a n , and they are arranged in decreasing order. That is, ay = VA7 for 1 < i <n. 
By equation (2), the singular values of A are the lengths of the vectors A \\,..., v4v„ . 


EXAMPLE 2 Let A be the matrix in Example 1. Since the eigenvalues of A T A are 
360, 90, and 0, the singular values of A are 

G\ = a/360 = 6v^l0, 02 = a/90 = 3a/10, 03 = 0 




1 



Lrom Example 1, the first singular value of A is the maximum of ||v4x|| over all unit 
vectors, and the maximum is attained at the unit eigenvector Vi. Theorem 7 in Section 7.3 
shows that the second singular value of A is the maximum of ||v4x|| over all unit vectors 
that are orthogonal toy 1 , and this maximum is attained at the second unit eigenvector, 
V 2 (Exercise 22). For the V 2 in Example 1, 



This point is on the minor axis of the ellipse in Figure 1, just as v4vi is on the major 
axis. (See Figure 2.) The first two singular values of A are the lengths of the major and 
minor semiaxes of the ellipse. ■ 


The fact that v4vi and v4v2 are orthogonal in Figure 2 is no accident, as the next 
theorem shows. 


THEOREM 9 Suppose {vi, ..., y n } is an orthonormal basis of W 1 consisting of eigenvectors of 

A t A , arranged so that the corresponding eigenvalues of A T A satisfy A 1 > • • • > X n , 
and suppose A has r nonzero singular values. Then {A\\, ..., v4v r } is an orthog¬ 
onal basis for Col A , and rank A = r . 


PROOF Because v z and A ; v y are orthogonal for i ^ j , 


(Am) t (Av j) = vf A r Av j 
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Thus {v4vi,..., A\ n ) is an orthogonal set. Furthermore, since the lengths of the vec¬ 
tors v4vi,..., A\ n are the singular values of A , and since there are r nonzero singular 
values, A\( ^ 0 if and only if 1 < / < r. So Ay\, ..., v4v r are linearly independent 
vectors, and they are in Col A. Finally, for any y in Col A — say, y = Ax —we can write 
x = c\\\ + • • • + c n \ n , and 

y — Ax = civ4vi H-h c r A\ r + c r +\A\ r +i H-F c n A\ n 

= Civ4vi + • • • + c r A\ r + 0 + • • • + 0 

Thus y is in Span {v4vi,..., v4v r }, which shows that {v4vi,..., v4v r } is an (orthogonal) 
basis for Col A . Hence rank A = dim Col A = r. ■ 

i— NUMERICAL NOTE - 

In some cases, the rank of A may be very sensitive to small changes in the entries 
of A . The obvious method of counting the number of pivot columns in A does 
not work well if A is row reduced by a computer. Roundoff error often creates an 
echelon form with full rank. 

In practice, the most reliable way to estimate the rank of a large matrix A 
is to count the number of nonzero singular values. In this case, extremely small 
nonzero singular values are assumed to be zero for all practical purposes, and the 
effective rank of the matrix is the number obtained by counting the remaining 
nonzero singular values. 1 


The Singular Value Decomposition 


The decomposition of A involves an/nx/i “diagonal” matrix X of the form 




m — r rows 



n — r columns 



where D is an r x r diagonal matrix for some r not exceeding the smaller of m and n . 
(If r equals m or n or both, some or all of the zero matrices do not appear.) 


THEOREM 10 The Singular Value Decomposition 

Let A be an m x n matrix with rank r. Then there exists an m x n matrix X as 
in (3) for which the diagonal entries in D are the first r singular values of A, 

0 \ > (72 > • • • > cr r > 0, and there exist an m x m orthogonal matrix U and an 
n x n orthogonal matrix V such that 

A = UY>V t 

Any factorization A = ITEV t , with U and V orthogonal, S as in (3), and positive 
diagonal entries in D , is called a singular value decomposition (or SVD) of A . The 
matrices U and V are not uniquely determined by A, but the diagonal entries of X 
are necessarily the singular values of A. See Exercise 19. The columns of U in such a 
decomposition are called left singular vectors of A , and the columns of V are called 
right singular vectors of A . 


1 In general, rank estimation is not a simple problem. For a discussion of the subtle issues involved, see 
Philip E. Gill, Walter Murray, and Margaret H. Wright, Numerical Linear Algebra and Optimization , vol. 1 
(Redwood City, CA: Addison-Wesley, 1991), Sec. 5.8. 
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PROOl Let A i and V/ be as in Theorem 9, so that {A\\,, Av r } is an orthogonal basis 
for Col A. Normalize each A\ t to obtain an orthonormal basis {ui,..., u r }, where 



and 

A\i = djUi (1 < i < r) (4) 

Now extend {ui,..., u r } to an orthonormal basis {ui,..., u m } of R m , and let 

U = [ ui u 2 • • • u m ] and V = [ Vi v 2 • • • v„ ] 

By construction, £/ and F are orthogonal matrices. Also, from (4), 

AV = [A \i ••• v4y r 0 ••• 0] = [cr\Ui ••• a r u r 0 ••• 0] 

Let D be the diagonal matrix with diagonal entries 0 \,..., o r , and let £ be as in 
(3) above. Then 


C/E = [ Ui 



Ol 


0 



a 2 

• 


0 

0 

• 

• 

(J r 


0 

0 



Computing an SVD 
7-10 


= [ CTiUi • • • 0>U r 0 • • • 0 ] 

= AV 

Since V is an orthogonal matrix, UHV T = AVV T = A. ■ 

The next two examples focus attention on the internal structure of a singular value 
decomposition. An efficient and numerically stable algorithm for this decomposition 
would use a different approach. See the Numerical Note at the end of the section. 


EXAMPLE 3 Use the results of Examples 1 and 2 to construct a singular value 


decomposition of A = 




14 

-2 


SOLUTION A construction can be divided into three steps. 

Step 1 . Find an orthogonal diagonalization of A T A. That is, find the eigenvalues of 
A t A and a corresponding orthonormal set of eigenvectors. If A had only two columns, 
the calculations could be done by hand. Larger matrices usually require a matrix pro¬ 
gram . 2 However, for the matrix A here, the eigendata for A T A are provided in Example 1. 

Step 2. Set up V and £. Arrange the eigenvalues of A T A in decreasing order. In Ex¬ 
ample 1, the eigenvalues are already listed in decreasing order: 360, 90, and 0. The 
corresponding unit eigenvectors, y i, v 2 , and V 3 , are the right singular vectors of A . Using 
Example 1, construct 



1/3 

-2/3 

2/3 

V = [ Vi v 2 v 3 ] = 

2/3 

-1/3 

-2/3 


2/3 

2/3 

1/3 


2 See the Study Guide for software and graphing calculator commands. MATLAB, for instance, can produce 
both the eigenvalues and the eigenvectors with one command, eig. 
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The square roots of the eigenvalues are the singular values: 

G\ = 6 \/l 0 , 02 = 3 \/T 0 , O3 = 0 


The nonzero singular values are the diagonal entries of D . The matrix E is the same 
size as A, with D in its upper left corner and with O’s elsewhere. 



E = [D 




0 

0 


Step 3. Construct U. When A has rank r, the first r columns of U are the normalized 
vectors obtained from A \\,..., A \ r . In this example, A has two nonzero singular val¬ 
ues , so rank A = 2. Recall from equation (2) and the paragraph before Example 2 that 
||v4vi|| = o^ and ||^4v21| = 02 . Thus 


1 

ui = — A\\ 

<*i 


1 

" 18 " 


3/710 

6^/l0 

6 


1/VT0 



1 

3 " 


1/VT0 

3^10 

-9 


- 3 /a/TO 


Note that {ui, 112 } is already a basis for M 2 . Thus no additional vectors are needed for 
U , and U = [u\ 112 ]. The singular value decomposition of A is 



3 /\/l 0 l/v'lo 
1/VTO - 3 /v^O 

t 

V 



0 

3VTX) 

t 

s 


0 

0 


1/3 

2/3 

2/3 

2/3 

-1/3 

2/3 

2/3 

-2/3 

1/3 


t 




EXAMPLE 4 


Find a singular value decomposition of A = 



SOLUTION First, compute A T A = 
with corresponding unit eigenvectors 



. The eigenvalues of A T A are 18 and 0, 




These unit vectors form the columns of V : 


V = [vi 



1/V2 

1/V2 


The singular values are <7i = VT8 = 3V2 and 02 = 0. Since there is only one nonzero 
singular value, the “matrix” D may be written as a single number. That is, D = 3V2. 
The matrix E is the same size as A , with D in its upper left corner: 



” D 

0" 


3a/2 

0 

E = 

0 

0 

— 

0 

0 


0 

0 


0 

0 


To construct U, first construct v4vi and v4v 2 : 



2/V2 


0 

v 4 vi = 

-4/V2 

, Av 2 = 

0 


4/V2 


0 
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FIGURE 3 


As a check on the calculations, verify that ||Avi 


ai = 2\fl. Of course, Av 


because ||Av 2 


02 = 0. The only column found for U so far is 





The other columns of U are found by extending the set {ui} to an orthonormal basis for 
M 3 . In this case, we need two orthogonal unit vectors u 2 and 113 that are orthogonal to Ui. 
(See Figure 3.) Each vector must satisfy ufx = 0, which is equivalent to the equation 
x\ — 2x2 + 2 x 3 = 0. A basis for the solution set of this equation is 



1 

<N 

1_ 


1- 

<N 

1_ 

W[ = 

1 

, W 2 = 

0 


1 

O 
_1 


—1 

_1 


(Check that wi and w 2 are each orthogonal to Ui .) Apply the Gram-Schmidt process 
(with normalizations) to {wi, w 2 }, and obtain 



" 2 /V 5 " 


-2/745 

u 2 = 

l/y/5 

, U 3 = 

4/745 


0 


5/745 


Finally, set U = [ Ui u 2 113 ], take £ 



1 

- 1 " 


1/3 

2/75 

A = 

-2 

2 

— 

- 2/3 

1/75 


2 

-2 


2/3 

0 


and V T from above, and write 


- 2/745 

4/745 

5/745 


372 

0 

0 




Applications of the Singular Value Decomposition 

The SVD is often used to estimate the rank of a matrix, as noted above. Several other nu¬ 
merical applications are described briefly below, and an application to image processing 
is presented in Section 7.5. 

EXAMPLE 5 (The Condition Number) Most numerical calculations involving an 
equation Ax = b are as reliable as possible when the SVD of A is used. The two 
orthogonal matrices U and V do not affect lengths of vectors or angles between vectors 
(Theorem 7 in Section 6.2). Any possible instabilities in numerical calculations are 
identified in £. If the singular values of A are extremely large or small, roundoff errors 
are almost inevitable, but an error analysis is aided by knowing the entries in £ and V. 

If A is an invertible n x n matrix, then the ratio 0 \ / or n of the largest and smallest 
singular values gives the condition number of A. Exercises 41-43 in Section 2.3 
showed how the condition number affects the sensitivity of a solution of Ax = b to 
changes (or errors) in the entries of A. (Actually, a “condition number” of A can be 
computed in several ways, but the definition given here is widely used for studying 
Ax = b.) ■ 

EXAMPLE 6 (Bases for Fundamental Subspaces) Given an SVD for an m x n 
matrix A , let Ui,..., u m be the left singular vectors, Vi ,... ,x n the right singular vectors, 
and or 1 ,..., cr n the singular values, and let r be the rank of A. By Theorem 9, 

{ui,... ,u r } 


is an orthonormal basis for Col A. 


(5) 
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The fundamental subspaces in 
Example 4. 


THEOREM 


Recall from Theorem 3 in Section 6.1 that (Col A) 


_L 


Nul A t . Hence 


{U r -|-1 , . . . , U/77 } 


( 6 ) 


is an orthonormal basis for Nul A T . 


Since ||v4v ( 


07 for 1 < i <n, and 07 is 0 if and only if i > r, the vectors 


v r +i,..., \ n span a subspace of Nulv4 of dimension n — r. By the Rank Theorem, 
dim Nul A = n — rank A. It follows that 


{v r+ i,...,v„} (7) 

is an orthonormal basis for Nul A, by the Basis Theorem (in Section 4.5). 

From (5) and ( 6 ), the orthogonal complement of Nul A T is Col A. Interchanging A 
and A T , note that (Nul A)^ = Col A T = Row A. Hence, from (7), 

{vi,...,v r } ( 8 ) 

is an orthonormal basis for Row A . 

Figure 4 summarizes (5)-(8), but shows the orthogonal basis {criUi,..., o>u r } for 
Colv4 instead of the normalized basis, to remind you that v4v ; = 07 u z for 1 < i < r. 
Explicit orthonormal bases for the four fundamental subspaces determined by A are 

useful in some calculations, particularly in constrained optimization problems. ■ 


Multiplication 



FIGURE 4 The four fundamental subspaces and the 
action of A . 


The four fundamental subspaces and the concept of singular values provide the final 
statements of the Invertible Matrix Theorem. (Recall that statements about A T have been 
omitted from the theorem, to avoid nearly doubling the number of statements.) The other 
statements were given in Sections 2.3,2.9, 3.2,4.6, and 5.2. 


The Invertible Matrix Theorem (concluded) 

Let A be an n x n matrix. Then the following statements are each equivalent to 
the statement that A is an invertible matrix. 


u. (CoM ) 1 = {0}. 

v. (Nul A) 1 - = R". 


w. Row A 



x. A has n nonzero singular values. 
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EXAMPLE 7 (Reduced SVD and the Pseudoinverse of A) When E contains rows or 
columns of zeros, a more compact decomposition of A is possible. Using the notation 
established above, let r = rank A, and partition U and V into submatrices whose first 
blocks contain r columns: 


U = [ U r U m — r ], where U r = [u\ • • • u r ] 
V = [ V r V n - r ], where V r = [\\ • • • v r ] 


Then U r is m x r and V r is n x r. (To simplify notation, we consider U m - r or V n - r 
even though one of them may have no columns.) Then partitioned matrix multiplication 
shows that 


A = [U r 





This factorization of A is called a reduced singular value decomposition of A . Since 
the diagonal entries in D are nonzero, D is invertible. The following matrix is called 
the pseudoinverse (also, the Moore-Penrose inverse) of A : 



Supplementary Exercises 12-14 at the end of the chapter explore some of the properties 
of the reduced singular value decomposition and the pseudoinverse. ■ 


EXAMPLE 8 (Least-Squares Solution) Given the equation Ax = b, use the pseu¬ 
doinverse of A in (10) to define 

x = A + b = V r D~ x Uj b 

Then, from the SVD in (9), 

Ax = {U r DVj){V r D~ x Uj b) 

= U r DD ~ 1 Uj b Because Vj V r = 1, 

= U r Uj b 

rr r ^ 

It follows from (5) that U r U r b is the orthogonal projection b of b onto Col A. (See 
Theorem 10 in Section 6.3.) Thus x is a least-squares solution of v4x = b. In fact, this x 
has the smallest length among all least-squares solutions of v4x = b. See Supplementary 
Exercise 14. ■ 


NUMERICAL NOTE - 

Examples 1-4 and the exercises illustrate the concept of singular values and 
suggest how to perform calculations by hand. In practice, the computation of A T A 
should be avoided, since any errors in the entries of A are squared in the entries 
of A t A. There exist fast iterative methods that produce the singular values and 
singular vectors of A accurately to many decimal places. 


Further Reading 

Horn, Roger A., and Charles R. Johnson, Matrix Analysis (Cambridge: Cambridge 
University Press, 1990). 

Long, Cliff, “Visualization of Matrix Singular Value Decomposition.” Mathematics 
Magazine 56 (1983), pp. 161-167. 
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Moler, C. Band D. Morrison, “Singular Value Analysis of Cryptograms.” Amer. Math. 
Monthly 90 (1983), pp. 78-87. 

Strang, Gilbert, Linear Algebra and Its Applications, 4th ed. (Belmont, CA: Brooks/ 
Cole, 2005). 

Watkins, David S., Fundamentals of Matrix Computations (New York: Wiley, 1991), 
pp.390-398,409-421. 



PRACTICE PROBLEMS 


1. Given a singular value decomposition, A = t/£U r , find an SVD of A T . How are 
the singular values of A and A T related? 

2. For any n x n matrix A , use the SVD to show that there is an n x n orthogonal matrix 
Q such that A r A = Q T (A T A)Q. 

Remark: Practice Problem 2 establishes that for any n x n matrix A, the matrices A A 7 
and A T A are orthogonally similar. 


7.4 EXERCISES 


Find the singular values of the matrices in Exercises 1-4 


A = 


40 -.78 .47 

37 -.33 -.87 


1. 

" 1 

0" 

2. 

"-3 

0" 



"nF 

1 OO 

• 

1 

-.52 

—A6 _ 

0 

-3 

0 

0 














".30 

-.51 

-.81" 

3. 

"2 

3" 

4. 

"3 

0" 


X 

.76 

.64 

-.12 

0 

2 

8 

3 



.58 

OO 

• 

.58 


7.10 

0 

0 


0 

3.10 

0 


0 

0 

0 


Find an SVD of each matrix in Exercises 5-12. [Hint: In Exer¬ 


cise 11, one choice for U is 


1/3 

2/3 

2/3 “ 


2/3 

-1/3 

2/3 

. In Exer- 

2/3 

2/3 

-1/3 



cise 12, one column of U can be 


5. 


7 


9 


-2 

0 

2 

2 

3 

0 

1 


0 

0 


6 . 


-1 

2 


8 


-3 

0 

1 


10 


11 . 


-3 1 

6 -2 
6 -2 


12 . 


13. Find the SVD of A = 


3 

2 


1/V6 

-2/V6 

1/V6 


-3 0 

0 -2 

4 6 

0 4 

7 1 

5 5 
0 0 

1 1 
0 1 
— 1 1 

2 2 
3 -2 


•1 


[Hint: Work with A T .] 


14. In Exercise 7, find a unit vector x at which Ax has maximum 
length. 

15. Suppose the factorization below is an SVD of a matrix A, 
with the entries in U and V rounded to two decimal places. 


a. What is the rank of 4? 

b. Use this decomposition of A , with no calculations, to 
write a basis for Col A and a basis for Nul A. [Hint: First 
write the columns of V .] 

16. Repeat Exercise 15 for the following SVD of a 3x4 
matrix A : 


A = 



-.86 -.11 -.50" 


" 12.48 

0 0 0" 


.31 .68 -.67 


0 

6.34 0 0 


.41 -.73 -.55 


0 

0 0 0 


.66 —.03 —.35 

.66 



-.13 -.90 -.39 

-.13 



x 


65 

34 


.08 

.42 


16 

84 


-.73 

-.08 


In Exercises 17-24, A is an m x n matrix with a singular value 
decomposition A = U'EV 7 , where U is an m x m orthogonal 
matrix, £ is an m x n “diagonal” matrix with r positive entries 
and no negative entries, and V is an n x n orthogonal matrix. 
Justify each answer. 

17. Show that if A is square, then | det4| is the product of the 
singular values of A. 

18. Suppose A is square and invertible. Find a singular value 
decomposition of A~ l . 

19. Show that the columns of V are eigenvectors of A T A, the 
columns of U are eigenvectors of AA T , and the diagonal 
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entries of £ are the singular values of A. [Hint: Use the SVD 
to compute A T A and AA T .] 

20. Show that if P is an orthogonal m x m matrix, then PA has 
the same singular values as A . 

21. Justify the statement in Example 2 that the second singular 
value of a matrix A is the maximum of ||dx|| as x varies 
over all unit vectors orthogonal to Vi , with Vi a right singular 
vector corresponding to the first singular value of A . [Hint: 
Use Theorem 7 in Section 7.3.] 

22. Show that if A is an n x n positive definite matrix, then an 
orthogonal diagonalization A = PDP T is a singular value 
decomposition of A . 

23. Let U = [ Ui • • • u m ] and V = [ Vi • • • \ n ], where the 
U/ and V/ are as in Theorem 10. Show that 

A = (JiUivf + <7 2 U 2 V2 H-b OT r U r \J. 

24. Using the notation of Exercise 23, show that A T Uj = Oj \j 
for 1 < j < r = rank A . 

25. Let T : R n -> R m be a linear transformation. Describe how 
to find a basis B for R n and a basis C for R m such that the 


matrix for T relative to B and C is an m x n “diagonal” 
matrix. 


[M] Compute an SVD of each matrix in Exercises 26 and 27. 
Report the final matrix entries accurate to two decimal places. Use 
the method of Examples 3 and 4. 


26. A 


27. A 


00 

13 


-4 

4 

2 

19 


-4 

12 

-14 

11 


-12 

8 

-2 

21 


4 

8 

6 

-8 - 

-4 

5 

-4 

2 

7 - 

-5 

-6 

4 

0 

-1 - 

-8 

2 

2 


-1 -2 4 4 -8 


28. [M] Compute the singular values of the 4x4 matrix in 
Exercise 9 in Section 2.3, and compute the condition number 

G\/ O4. 


29. [M] Compute the singular values of the 5x5 matrix in 
Exercise 10 in Section 2.3, and compute the condition num¬ 
ber <Ti /<J 5 . 


SOLUTIONS TO PRACTICE PROBLEMS 

If v4 = U£U r , where £ ism xn, then A T = (V T ) T Y< T U T = VYJU T . This is an 
SVD of A T because V and U are orthogonal matrices and £ r is an n x m “diagonal” 
matrix. Since £ and £ r have the same nonzero diagonal entries, A and A T have the 
same nonzero singular values. [Note: If A is 2 x n, then AA T is only 2x2 and its 
eigenvalues may be easier to compute (by hand) than the eigenvalues of A T A .] 

Use the SVD to write A = U£ U r , where U and V are n x n orthogonal matrices 
and £ is an n x n diagonal matrix. Notice that U T U = I = V T V and £ r = £, 
since U and V are orthogonal matrices and £ is a diagonal matrix. Substituting the 
SVD for A into AA T and A T A results in 

AA t = U T,V t {U HV T ) T = UT,V T Vt r U T = UY,X t U t = UT?U t , 

and 

r 7~ T r T 1 r T 1 r T 1 r T 1 r T 1 r T~ 1 r T 1 r T~ 1 O r T~ 1 

a t a = (uy,v t ) t uy,v t = vyJu t ui:v t = vy, t t,v t = vz 2 v T . 

Let ^ = LtV.Then 

Q t (A t A)Q = (VU T ) T (VT, 2 V t )(VU t ) = UV t VT, 2 V t VU t = UT?U t = AA t . 

7.5 APPLICATIONS TO IMAGE PROCESSING AND STATISTICS 


i. 


2 . 


The satellite photographs in this chapter’s introduction provide an example of multidi¬ 
mensional, or multivariate , data—information organized so that each datum in the data 
set is identified with a point (vector) in W 1 . The main goal of this section is to explain a 
technique, called principal component analysis , used to analyze such multivariate data. 
The calculations will illustrate the use of orthogonal diagonalization and the singular 
value decomposition. 
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Principal component analysis can be applied to any data that consist of lists of 
measurements made on a collection of objects or individuals. For instance, consider a 
chemical process that produces a plastic material. To monitor the process, 300 samples 
are taken of the material produced, and each sample is subjected to a battery of eight 
tests, such as melting point, density, ductility, tensile strength, and so on. The laboratory 
report for each sample is a vector in M 8 , and the set of such vectors forms an 8 x 300 
matrix, called the matrix of observations. 

Loosely speaking, we can say that the process control data are eight-dimensional. 
The next two examples describe data that can be visualized graphically. 


EXAM PLE 1 An example of two-dimensional data is given by a set of weights and 
heights of N college students. Let X y denote the observation vector in M 2 that lists the 
weight and height of the j th student. If w denotes weight and h height, then the matrix 
of observations has the form 

U)\ W 2 ••• Wjy 

h\ h 2 • • • h 1 v 

t t t 

Xi x 2 X N 


The set of observation vectors can be visualized as a two-dimensional scatter plot. See 
Figure 1. ■ 






FIGURE 2 

A scatter plot of spectral data for a 
satellite image. 


- w 

FIGURE 1 A scatter plot of observation 
vectors Xi,..., X N . 


EXAM PLE 2 The first three photographs of Railroad Valley, Nevada, shown in the 
chapter introduction can be viewed as one image of the region, with three spectral 
components , because simultaneous measurements of the region were made at three 
separate wavelengths. Each photograph gives different information about the same 
physical region. For instance, the first pixel in the upper-left corner of each photograph 
corresponds to the same place on the ground (about 30 meters by 30 meters). To each 
pixel there corresponds an observation vector in M 3 that lists the signal intensities for 
that pixel in the three spectral bands. 

Typically, the image is 2000 x 2000 pixels, so there are 4 million pixels in the 
image. The data for the image form a matrix with 3 rows and 4 million columns 
(with columns arranged in any convenient order). In this case, the “multidimensional” 
character of the data refers to the three spectral dimensions rather than the two spatial 
dimensions that naturally belong to any photograph. The data can be visualized as a 
cluster of 4 million points in M 3 , perhaps as in Figure 2. ■ 


Mean and Covariance 

To prepare for principal component analysis, let [ Xi • • • X^ ] be a p x N matrix of 
observations, such as described above. The sample mean, M, of the observation vectors 
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w 


Xi,..., Xjy is given by 

M= l(Xj +--- + X N ) 

For the data in Figure 1, the sample mean is the point in the “center” of the scatter plot. 
For k = 1,..., TV, let 

X* = X*-M 

The columns of the p x N matrix 


B = [X l X 2 



FIGURE 3 

Weight-height data in 
mean-deviation form. 


have a zero sample mean, and B is said to be in mean-deviation form. When the sample 
mean is subtracted from the data in Figure 1, the resulting scatter plot has the form in 
Figure 3. 

The (sample) covariance matrix is the p x p matrix S defined by 



Since any matrix of the form BB T is positive semidefinite, so is S . (See Exercise 25 in 
Section 7.2 with B and B T interchanged.) 


EXAMPLE 3 Three measurements are made on each of four individuals in a random 
sample from a population. The observation vectors are 



1 

1_ 


1 

1_ 


1 

r- 

i_ 


1— 

oo 

1_ 

X, = 

2 

1 

, X 2 = 

2 

13 

. X 3 = 

i 

00 

1_ 

, X 4 = 

- 1 

_1 


Compute the sample mean and the covariance matrix. 


SOLUTION The sample mean is 


i ( 

1 


4 


7 


8 

\ \ 

20 


5 

M = 7 

2 

+ 

2 

+ 

8 

+ 

4 

= T 

16 

— 

4 

4 V 

1 


13 


1 


5 

J 4 

20 


5 


Subtract the sample mean from Xi,..., X 4 to obtain 



The sample covariance matrix is 
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To discuss the entries in S = [sij ], let X represent a vector that varies over the 
set of observation vectors and denote the coordinates of X by x\,... ,x p . Then x\, 
for example, is a scalar that varies over the set of first coordinates of Xi,..., X#. For 
j = 1,..., p , the diagonal entry Sjj in S is called the variance of xj. 

The variance of xj measures the spread of the values of xj . (See Exercise 13.) In 
Example 3, the variance of X\ is 10 and the variance of X 3 is 32. The fact that 32 is more 
than 10 indicates that the set of third entries in the response vectors contains a wider 
spread of values than the set of first entries. 

The total variance of the data is the sum of the variances on the diagonal of S. In 
general, the sum of the diagonal entries of a square matrix S is called the trace of the 
matrix, written tr(S). Thus 

{total variance} = tr (S) 

The entry s tJ in S for i ^ j is called the covariance of x t and xj . Observe that 
in Example 3, the covariance between X\ and X 3 is 0 because the (1, 3)-entry in S is 0. 
Statisticians say that x\ and X 3 are uncorrelated. Analysis of the multivariate data in 
Xi,..., X# is greatly simplified when most or all of the variables X \,..., x p are uncor¬ 
related, that is, when the covariance matrix of Xi,..., X^ is diagonal or nearly diagonal. 

Principal Component Analysis 

For simplicity, assume that the matrix [Xi ••• X^ ] is already in mean-deviation 
form. The goal of principal component analysis is to find an orthogonal p x p matrix 
P = [ Ui • • • ] that determines a change of variable, X = P Y, or 


X\ 

X 2 

• 

• 

• 

= [ui u 2 ••• Up] 

yi 

T2 

• 

• 

• 

_Xp _ 


_ y P _ 


with the property that the new variables y \,..., y p are uncorrelated and are arranged in 
order of decreasing variance. 

The orthogonal change of variable X = P Y means that each observation vector X^ 
receives a “new name,” Y&, such that X^ = P . Notice that is the coordinate vector 
of X^ with respect to the columns of P , and = P~ l Xk = P T Xk fork = 1 ,..., N . 

It is not difficult to verify that for any orthogonal P , the covariance matrix of 
Y\,... ,Y n is P T SP (Exercise 11). So the desired orthogonal matrix P is one that 
makes P T SP diagonal. Let D be a diagonal matrix with the eigenvalues X\,... ,X P 
of S on the diagonal, arranged so that X\ > X 2 > • • • > X p > 0, and let P be an 
orthogonal matrix whose columns are the corresponding unit eigenvectors u\,...,u p . 
Then S = PDP T and P T SP = D. 

The unit eigenvectors ui,..., of the covariance matrix S are called the principal 
components of the data (in the matrix of observations). The first principal component 
is the eigenvector corresponding to the largest eigenvalue of S , the second principal 
component is the eigenvector corresponding to the second largest eigenvalue, and so on. 

The first principal component Ui determines the new variable y\ in the following 
way. Let C\,... ,c p be the entries in Ui. Since is the first row of P T , the equation 
Y = P T X shows that 

y 1 = ufX = C\X\ + c 2 x 2 H-E c p x p 

Thus y 1 is a linear combination of the original variables X\ ,..., x p , using the entries in 
the eigenvector ui as weights. In a similar fashion, U 2 determines the variable y 2 , and 


so on. 
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EXAMPLE 4 The initial data for the multispectral image of Railroad Valley 
(Example 2) consisted of 4 million vectors in M 3 . The associated covariance matrix is 1 



2382.78 

2611.84 

2136.20 


2611.84 2136.20 

3106.47 2553.90 

2553.90 2650.71 


Find the principal components of the data, and list the new variable determined by the 
first principal component. 


SOLUTION The eigenvalues of S and the associated principal components (the unit 
eigenvectors) are 

Ai = 7614.23 A 2 = 427.63 A 3 = 98.10 



".5417" 


"-.4894" 


.6834" 

Ul = 

.6295 

.5570 

u 2 = 

-.3026 

.8179 

u 3 = 

-.7157 

.1441 


Using two decimal places for simplicity, the variable for the first principal component is 


y i = .54xi + .63x2 + .56x 3 

This equation was used to create photograph (d) in the chapter introduction. The 
variables X \, x 2 , and x 3 are the signal intensities in the three spectral bands. The values 
of Xi, converted to a gray scale between black and white, produced photograph (a). 
Similarly, the values of x 2 and x 3 produced photographs (b) and (c), respectively. At 
each pixel in photograph (d), the gray scale value is computed from yi, a weighted 
linear combination of xi, x 2 , and x 3 . In this sense, photograph (d) “displays” the first 
principal component of the data. ■ 


In Example 4, the covariance matrix for the transformed data, using variables y \, 

y 2 , and y 3 , is 



7614.23 

0 

0 


0 

427.63 

0 


0 

0 

98.10 


Although D is obviously simpler than the original covariance matrix S, the merit 
of constructing the new variables is not yet apparent. However, the variances of the 
variables y i, y 2 , and y 3 appear on the diagonal of D , and obviously the first variance 
in D is much larger than the other two. As we shall see, this fact will permit us to view 
the data as essentially one-dimensional rather than three-dimensional. 


Reducing the Dimension of Multivariate Data 

Principal component analysis is potentially valuable for applications in which most of 
the variation, or dynamic range, in the data is due to variations in only a few of the new 
variables, yi,..., y p . 

It can be shown that an orthogonal change of variables, X = P Y, does not change 
the total variance of the data. (Roughly speaking, this is true because left-multiplication 
by P does not change the lengths of vectors or the angles between them. See Exercise 
12.) This means that if S = PDP T , then 

{total variance 

of | ^ ^ jp 

The variance of yj is A y, and the quotient Ay / tr (S) measures the fraction of the total 
variance that is “explained” or “captured” by yy. 

1 Data for Example 4 and Exercises 5 and 6 were provided by Earth Satellite Corporation, Rockville, 
Maryland. 


total variance 

of y i,... > yp 


tr(D) — Ai T-b A 


p 
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EXAMPLE 5 Compute the various percentages of variance of the Railroad Valley 
multispectral data that are displayed in the principal component photographs, (d)-(f), 
shown in the chapter introduction. 

SOLUTION The total variance of the data is 


tr(D) = 7614.23 + 427.63 + 98.10 = 8139.96 

[Verify that this number also equals tr(*S).] The percentages of the total variance 
explained by the principal components are 

First component Second component Third component 


7614.23 

- = 93.5% 

8139.96 


427.63 

- = 5.3% 

8139.96 


98.10 

8139.96 


1 . 2 % 


In a sense, 93.5% of the information collected by Landsat for the Railroad Valley region 
is displayed in photograph (d), with 5.3% in (e) and only 1 .2% remaining for (f). ■ 

The calculations in Example 5 show that the data have practically no variance in 
the third (new) coordinate. The values of y 3 are all close to zero. Geometrically, the 
data points lie nearly in the plane y 3 = 0 , and their locations can be determined fairly 
accurately by knowing only the values of y\ and 72 - In fact, 3^2 also has relatively small 
variance, which means that the points lie approximately along a line, and the data are 
essentially one-dimensional. See Figure 2, in which the data resemble a popsicle stick. 


Characterizations of Principal Component Variables 

If y\,... ,y p arise from a principal component analysis of a p x N matrix of obser¬ 
vations, then the variance of y\ is as large as possible in the following sense: If u is 
any unit vector and if y = u r X, then the variance of the values of y as X varies over 
the original data Xi,..., X^ turns out to be u T Su. By Theorem 8 in Section 7.3, the 
maximum value of u r *Su, over all unit vectors u, is the largest eigenvalue X\ of S, and 
this variance is attained when u is the corresponding eigenvector ui. In the same way, 
Theorem 8 shows that y 2 has maximum possible variance among all variables y = u r X 
that are uncorrelated with y\ . Likewise, y 3 has maximum possible variance among all 
variables uncorrelated with both y\ and y 2 , and so on. 

1 — NUMERICAL NOTE - 

The singular value decomposition is the main tool for performing principal com¬ 
ponent analysis in practical applications. If B is a p x N matrix of observations 
in mean-deviation form, and if A = (1/ N — 1) B T , then A T A is the covariance 
matrix, S. The squares of the singular values of A are the p eigenvalues of S, 
and the right singular vectors of A are the principal components of the data. 

As mentioned in Section 7.4, iterative calculation of the SVD of A is faster 
and more accurate than an eigenvalue decomposition of S. This is particularly 
true, for instance, in the hyperspectral image processing (with p = 224) men¬ 
tioned in the chapter introduction. Principal component analysis is completed in 
seconds on specialized workstations. 


Further Reading 

Lillesand, Thomas M., and Ralph W. Kiefer, Remote Sensing and Image Interpretation , 
4th ed. (New York: John Wiley, 2000). 
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PRACTICE PROBLEMS 


The following table lists the weights and heights of five boys: 


Boy 

#1 

#2 

#3 

#4 

#5 

Weight (lb) 

120 

125 

125 

135 

145 

Height (in.) 

61 

60 

64 

68 

72 


1. Find the covariance matrix for the data. 

2. Make a principal component analysis of the data to find a single size index that 
explains most of the variation in the data. 


7.5 EXERCISES 


In Exercises 1 and 2, convert the matrix of observations to mean- 
deviation form, and construct the sample covariance matrix. 

I" 19 22 6 3 2 20" 

l ' [\2 6 9 15 13 5 _ 

I" 1 5 2 6 7 3" 

2 ' [3 11 6 8 15 11 

3. Find the principal components of the data for Exercise 1. 

4. Find the principal components of the data for Exercise 2. 

5. [M] A Landsat image with three spectral components was 
made of Homestead Air Force Base in Florida (after the 
base was hit by Hurricane Andrew in 1992). The covariance 
matrix of the data is shown below. Find the first principal 
component of the data, and compute the percentage of the 
total variance that is contained in this component. 



164.12 

32.73 

81.04 


32.73 

539.44 

249.13 


81.04 

249.13 

189.11 


6. [M] The covariance matrix below was obtained from a Fand- 
sat image of the Columbia River in Washington, using data 
from three spectral bands. Fet x\, x 2 , x 3 denote the spectral 
components of each pixel in the image. Find a new variable of 
the form yi = c\X\ + c 2 x 2 + c 3 x 3 that has maximum possi¬ 
ble variance, subject to the constraint that c\ + c\ + c\ — 1. 
What percentage of the total variance in the data is explained 


by y 1 ? 




"29.64 

18.38 

5.00 

s = 

18.38 

20.82 

14.06 


5.00 

14.06 

29.21 


7. Fet xi,x 2 denote the variables for the two-dimensional 
data in Exercise 1. Find a new variable y 1 of the form 
y 1 = C\X\ + c 2 x 2 , with c\ + c\ — 1, such that y 1 has maxi¬ 
mum possible variance over the given data. How much of the 
variance in the data is explained by yi? 

8. Repeat Exercise 7 for the data in Exercise 2. 


9. Suppose three tests are administered to a random sample 
of college students. Fet Xi,... ,X N be observation vectors 
in IR 3 that list the three scores of each student, and for 
j = 1,2, 3, let Xj denote a student’s score on the j th exam. 
Suppose the covariance matrix of the data is 



Fet y be an “index” of student performance, with y = 
C\X\ + c 2 x 2 + C 3 X 3 and c\ + c\ + c\ = 1. Choose c\, c 2 , c 3 
so that the variance of y over the data set is as large as 
possible. [Hint: The eigenvalues of the sample covariance 
matrix are A = 3,6, and 9.] 


10. [M] Repeat Exercise 9 with S 


5 4 2 

4 114 

2 4 5 


11. Given multivariate data Xi,...,X N (in R p ) in mean- 
deviation form, let P be a p x p matrix, and define 
Y k = P T X k fork = 1,..., A. 

a. Show that Yi,..., are in mean-deviation form. [Hint: 
Fet w be the vector in with a 1 in each entry. Then 
[Xi ••• X N ] w = 0 (the zero vector in R p ).] 

b. Show that if the covariance matrix of Xj,..., X N is S , 
then the covariance matrix of Y x ,..., Y N is P T SP . 


12. Fet X denote a vector that varies over the columns of a p x N 
matrix of observations, and let P be a p x p orthogonal 
matrix. Show that the change of variable X = PY does not 
change the total variance of the data. [Hint: By Exercise 11, 
it suffices to show that iv(P T SP) = tr(*S). Use a property 
of the trace mentioned in Exercise 25 in Section 5.4.] 


13. The sample covariance matrix is a generalization of a formula 
for the variance of a sample of N scalar measurements, say, 
t\ ,... Jn • If m is the average of t\ ,... , 0v, then the sample 
variance is given by 

- Y (4 — m) 2 

N - l^ u ; 

k=\ 


(i) 
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Show how the sample covariance matrix, S , defined prior to 
Example 3, may be written in a form similar to (1). [Hint: Use 
partitioned matrix multiplication to write S as l/(N — 1) 


times the sum of N matrices of size p x p. For 1 < k < N , 
write X k — M in place of X k .] 


SOLUTIONS TO PRACTICE PROBLEMS 


1. First arrange the data in mean-deviation form. The sample mean vector is easily 

130 

. Subtract M from the observation vectors (the columns in 


seen to be M 


65 


the table) and obtain 


B 


10 

-4 


5 

5 


5 

1 


5 

3 


15 

7 


Then the sample covariance matrix is 




1 


5-1 


10 -5 
-4 -5 


5 

1 


5 

3 


15 

7 


10 -4 


-5 

-5 

5 

15 


5 

1 

3 

7 


"400 

190" 


' 100.0 

47.5" 

190 

100 


47.5 

25.0 


1 

4 


2. The eigenvalues of S are (to two decimal places) 

Ai = 123.02 and A 2 = 1.98 


The unit eigenvector corresponding to X\ is u 


900 

436 


. (Since S is 2 x 2, the 


computations can be done by hand if a matrix program is not available.) For the size 
index , set 

y = .900ri) + .436/z 

A 

where w and h are weight and height, respectively, in mean-deviation form. The 
variance of this index over the data set is 123.02. Because the total variance is 
tr (S) = 100 + 25 = 125, the size index accounts for practically all (98.4%) of the 
variance of the data. 

The original data for Practice Problem 1 and the line determined by the first 
principal component u are shown in Figure 4. (In parametric vector form, the line 
is x = M + tu.) It can be shown that the line is the best approximation to the data, 


h 


Inches 



w 


FIGURE 4 An orthogonal regression line determined by the 
first principal component of the data. 
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in the sense that the sum of the squares of the orthogonal distances to the line is 
minimized. In fact, principal component analysis is equivalent to what is termed 
orthogonal regression , but that is a story for another day. 


CHAPTER 7 SUPPLEMENTARY EXERCISES 


1. Mark each statement True or False. Justify each answer. In 
each part, A represents an n x n matrix. 

a. If A is orthogonally diagonalizable, then A is symmetric. 

b. If A is an orthogonal matrix, then A is symmetric. 

c. If A is an orthogonal matrix, then || Ax|| = ||x|| for all x 
in R n . 

d. The principal axes of a quadratic form x T Ax can be the 
columns of any matrix P that diagonalizes A. 

e. If P is an n x n matrix with orthogonal columns, then 
P T = P~ l . 


f. If every coefficient in a quadratic form is positive, then 
the quadratic form is positive definite. 

g. Ifx r Ax > 0 for some x, then the quadratic form x T Ax is 
positive definite. 


h. By a suitable change of variable, any quadratic form can 
be changed into one with no cross-product term. 



The largest value of a quadratic form x T Ax, for ||x 
is the largest entry on the diagonal of A. 



j. The maximum value of a positive definite quadratic form 
x^x is the greatest eigenvalue of A. 


k. A positive definite quadratic form can be changed into 
a negative definite form by a suitable change of variable 
x = Pu, for some orthogonal matrix P . 

l. An indefinite quadratic form is one whose eigenvalues 
are not definite. 


m. If P is an n x n orthogonal matrix, then the change of 
variable x = P u transforms x T Ax into a quadratic form 
whose matrix is P~ l AP. 


n. If U is m x n with orthogonal columns, then UU T x is the 
orthogonal projection of x onto Col U . 



If B is m x n and x is a unit vector in R n , then || Bx 
where <Ji is the first singular value of B . 


< (Tl, 


p. A singular value decomposition of an m x n matrix B 
can be written as B = P'ZQ, where P is an m x m 
orthogonal matrix, Q is an n x n orthogonal matrix, and 
X is an m x n “diagonal” matrix. 

q. If A is n x n, then A and A T A have the same singular 
values. 


2. Let {ui,... ,u„} be an orthonormal basis for R'\ and let 
Ai,..., X n be any real scalars. Define 


A = Aiuiuf H-b A n u n uJ 


a. Show that A is symmetric. 


b. Show that X\,... ,X n are the eigenvalues of A. 

3. Let A be an n x n symmetric matrix of rank r. Explain why 
the spectral decomposition of A represents A as the sum of 
r rank 1 matrices. 

4. Let A be an n x n symmetric matrix. 

a. Show that (Col A) 1 - = Nul A. [Hint: See Section 6.1.] 

b. Show that each y in M" can be written in the form y = 
y + z, with y in Col A and z in Nul A. 

5. Show that if v is an eigenvector of an n x n matrix A and v 
corresponds to a nonzero eigenvalue of A , then v is in Col A . 
[Hint: Use the definition of an eigenvector.] 

6. Let A be an n x n symmetric matrix. Use Exercise 5 and 
an eigenvector basis for R /? to give a second proof of the 
decomposition in Exercise 4(b). 

7. Prove that an n x n matrix A is positive definite if and only 
if A admits a Cholesky factorization, namely, A = R T R for 
some invertible upper triangular matrix R whose diagonal 
entries are all positive. [Hint: Use a QR factorization and 
Exercise 26 in Section 7.2.] 

8. Use Exercise 7 to show that if A is positive definite, then 
A has an LU factorization, A = LU , where U has positive 
pivots on its diagonal. (The converse is true, too.) 

If A is m x n , then the matrix G = A r A is called the Gram matrix 
of A. In this case, the entries of G are the inner products of the 
columns of A. (See Exercises 9 and 10.) 

9. Show that the Gram matrix of any matrix A is positive 
semidefinite, with the same rank as A. (See the Exercises in 
Section 6.5.) 

10. Show that if an n x n matrix G is positive semidefinite and 
has rank r, then G is the Gram matrix of some r x n matrix 
A. This is called a rank-revealing factorization of G. [Hint: 
Consider the spectral decomposition of G, and first write G 
as BB r for an n x r matrix B .] 

11. Prove that any «x« matrix A admits a polar decomposition 
of the form A = PQ , where P is an n x n positive semidefi¬ 
nite matrix with the same rank as A and where Q is an n x n 
orthogonal matrix. [Hint: Use a singular value decomposi¬ 
tion, A = G£U r , and observe that A = (UXU T )(UV T ).] 
This decomposition is used, for instance, in mechanical en¬ 
gineering to model the deformation of a material. The matrix 
P describes the stretching or compression of the material (in 
the directions of the eigenvectors of P ), and Q describes the 
rotation of the material in space. 
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Exercises 12-14 concern an m x n matrix A with a reduced sin¬ 
gular value decomposition, A = U r DVj , and the pseudoinverse 

A+ = V r D~ x Uj. 


12. Verify the properties of A+: 


a. For each y in IR m , AA + y is the orthogonal projection of 
y onto Col A . 

b. For each x in , A + Ax is the orthogonal projection of x 
onto Row A . 

c. AA + A = A mdA+AA+ = A+. 


13. Suppose the equation Ax = b is consistent, and let 
x + = A+b. By Exercise 23 in Section 6.3, there is exactly 
one vector p in Row A such that v4p = b. The following steps 
prove that x + = p and x + is the minimum length solution of 
Ax = b. 

a. Show that x + is in Row A . [Hint: Write b as Ax for some 
x, and use Exercise 12.] 

b. Show that x + is a solution of Ax = b. 

c. Show that if u is any solution of Ax = b, then 
|| x + 1| < ||u|| , with equality only if u = x + . 


14. 


Given any b in W n , adapt Exercise 13 to show that v4 + b is the 
least-squares solution of minimum length. [Hint: Consider 

/V /\ 

the equation Ax = b, where b is the orthogonal projection 
of b onto Col A .] 


[M] In Exercises 15 and 16, construct the pseudoinverse of A. Be¬ 
gin by using a matrix program to produce the S VD of A , or, if that 
is not available, begin with an orthogonal diagonalization of A T A. 
Use the pseudoinverse to solve Ax = b, for b = (6, —1, —4,6), 
and let x be the solution. Make a calculation to verify that x 
is in Row A . Find a nonzero vector u in Nul A , and verify that 


x 


< |x + u ||, which must be true by Exercise 13(c). 


15. A = 


16. A = 



-3 

-3 

-6 

6 

1 


-1 

-1 

-1 

1 

-2 


0 

0 

-1 

1 

-1 


1 

0 

0 

-1 

1 

-1 


1 _ 

0 

-1 

-2 

0 


-5 

0 

3 

5 

0 


2 

0 

-1 

-2 

0 


6 

0 

-3 

-6 

0 















The Geometry of 
Vector Spaces 


INTRODUCTORY EXAMPLE 

The Platonic Solids 

In the city of Athens in 387 B.C., the Greek philosopher 
Plato founded an Academy, sometimes referred to as the 
world’s first university. While the curriculum included 
astronomy, biology, political theory, and philosophy, the 
subject closest to his heart was geometry. Indeed, inscribed 
over the doors of his academy were these words: “Let no 
one destitute of geometry enter my doors.” 

The Greeks were greatly impressed by geometric 
patterns such as the regular solids. A polyhedron is called 
regular if its faces are congruent regular polygons and all 
the angles at the vertices are equal. As early as 100 years 
before Plato, the Pythagoreans knew at least three of the 
regular solids: the tetrahedron (4 triangular faces), the cube 
(6 square faces), and the octahedron (8 triangular faces). 
(See Figure 1.) These shapes occur naturally as crystals of 
common minerals. There are only five such regular solids, 
the remaining two being the dodecahedron (12 pentagonal 
faces) and the icosahedron (20 triangular faces). 

Plato discussed the basic theory of these five solids in 
the dialogue Timaeus , and since then they have carried his 
name: the Platonic solids. 

For centuries there was no need to envision geometric 
objects in more than three dimensions. But nowadays 
mathematicians regularly deal with objects in vector spaces 



having four, five, or even hundreds of dimensions. It is not 
necessarily clear what geometrical properties one might 
ascribe to these objects in higher dimensions. 

For example, what properties do lines have in 
2-space and planes have in 3-space that would be useful 
in higher dimensions? How can one characterize such 
objects? Sections 8.1 and 8.4 provide some answers. 
The hyperplanes of Section 8.4 will be important for 
understanding the multidimensional nature of the linear 
programming problems in Chapter 9. 

What would the analogue of a polyhedron “look 
like” in more than three dimensions? A partial answer 
is provided by two-dimensional projections of the four¬ 
dimensional object, created in a manner analogous to two- 
dimensional projections of a three-dimensional object. 
Section 8.5 illustrates this idea for the four-dimensional 
“cube” and the four-dimensional “simplex.” 

The study of geometry in higher dimensions not 
only provides new ways of visualizing abstract algebraic 
concepts, but also creates tools that may be applied in R 3 . 
For instance, Sections 8.2 and 8.6 include applications to 
computer graphics, and Section 8.5 outlines a proof (in 
Exercise 22) that there are only five regular polyhedra in 
R 3 . 
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FIGURE 1 The five Platonic solids. 



Most applications in earlier chapters involved algebraic calculations with subspaces and 
linear combinations of vectors. This chapter studies sets of vectors that can be visualized 
as geometric objects such as line segments, polygons, and solid objects. Individual 
vectors are viewed as points. The concepts introduced here are used in computer graph¬ 
ics, linear programming (in Chapter 9), and other areas of mathematics. 1 

Throughout the chapter, sets of vectors are described by linear combinations, but 
with various restrictions on the weights used in the combinations. For instance, in Sec¬ 
tion 8.1, the sum of the weights is 1, while in Section 8.2, the weights are positive and 
sum to 1. The visualizations are in R 2 or R 3 , of course, but the concepts also apply to 
R n and other vector spaces. 


8.1 AFFINE COMBINATIONS 


An affine combination of vectors is a special kind of linear combination. Given vec¬ 
tors (or “points”) Vi, V 2 ,..., \ p in R 77 and scalars c\ ,..., c p , an affine combination of 
Vi, V 2 ,,\ p is a linear combination 


c ivi H-b c p \ p 


such that the weights satisfy C\ + • • • + c p = 1. 


1 See Foley, van Dam, Feiner, and Hughes, Computer Graphics—Principles and Practice , 2nd edition 
(Boston: Addison-Wesley, 1996), pp. 1083-1112. That material also discusses coordinate-free “affine 
spaces.” 
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DEFINITION 


The set of all affine combinations of points in a set S is called the affine hull (or 
affine span) of S , denoted by aff S . 


The affine hull of a single point Vi is just the set {vi}, since it has the form c\\\ where 
C\ — 1 . The affine hull of two distinct points is often written in a special way. Suppose 
y = c iVi + C 2 V 2 with c\ + C 2 = 1 . Write t in place of C 2 , so that c\ = \ — C 2 — 1 — t . 
Then the affine hull of {vi, V 2 } is the set 


y = (1 -f)vi + t\ 2 . 


with t in M 



This set of points includes Vi (when t = 0) and V 2 (when t = 1). If V 2 = Vi, then (1) 
again describes just one point. Otherwise, (1) describes the line through vi and V 2 . To 
see this, rewrite (1) in the form 


y = vi + t(\ 2 - vi) = p + m, 


with t in M 


where p is Vi and u is V 2 — vi. The set of all multiples of u is Span {u}, the line through 
u and the origin. Adding p to each point on this line translates Span{u} into the line 
through p parallel to the line through u and the origin. See Figure 1. (Compare this 
figure with Figure 5 in Section 1.5.) 



Figure 2 uses the original points Vi and V2, and displays aff{vi,V2} as the line 
through Vi and V 2 . 



= Vi + dv 2 -v 1 ) 


FIGURE 2 


Notice that while the point y in Figure 2 is an affine combination of Vi and V2 , the 
point y — vi equals t (\2 — Vi), which is a linear combination (in fact, a multiple) of 
V2 — vi. This relation between y and y — Vi holds for any affine combination of points, 
as the following theorem shows. 


A point y in M 77 is an affine combination of Vi,..., \ p in R 77 if and only if y 
is a linear combination of the translated points V 2 — Vi,..., v 


vi 


p 


Vl. 


THEOREM 1 
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PROOF If y — vi is a linear combination of v 2 — Vi,..., — Vi, there exist weights 

c 2 ,...,c p such that 

y —Vi = C 2 O 2 -V 1 ) H- hCpCvp-vi) (2) 

Then 

y=(l -c 2 -c p )vi+c 2 v 2 H -h c p \ p (3) 

and the weights in this linear combination sum to 1. So y is an affine combination of 
Vi ,,\ p . Conversely, suppose 

y = C 1 Y 1 + c 2 \2 H-h c p \ p (4) 

where C\ + • • • + c p = 1. Since C\ — 1 — c 2 - c p , equation (4) may be written 

as in (3), and this leads to (2), which shows that y — Vi is a linear combination of 

V 2 - Vi, . . . ,Y p ~ Vl. ■ 

In the statement of Theorem 1, the point Vi could be replaced by any of the other 
points in the list \\,... ,\ p . Only the notation in the proof would change. 


T 


" 2 " 


T 


"- 2 " 

, and y = 

"4" 

2 

, v 2 = 

5 

, v 3 = 

3 

, v 4 = 

2 

1 


EXAMPLE 1 Letvi 


If possible, write y as an affine combination of Vi , v 2 , V3 , and V4. 
SOLUTION Compute the translated points 



1 

1_ 


1 

0 

1_ 


1 

OJ 

_1 


r 3 1 

v 2 - Vi = 

3 

, v 3 - Vl = 

1 

_1 

, V 4 - Vl = 

1 

O 

_1 

, y-vi = 

-1 

_1 


To find scalars c 2 , c 2 , and C\ such that 

c 2 (v 2 - Vi) + c 3 (v 3 - Vi) + c 4 (v 4 - Vl) = y - Vi 
row reduce the augmented matrix having these points as columns: 


"1 

0 

-3 

_1 


"1 

0 

-3 

1 

CO 

3 

1 

0 

-1 


0 

1 

9 

-10 


£3 


y - Vl = 3(v 2 - Vl) - 10(v 3 - Vl) + 0(v 4 - Vl) 


and 


y = 8 vi + 3v 2 — IOV 3 


As another example, take c\ — 1. Then c 2 = 6 and c 2 


19, so 


( 5 ) 


This shows that equation (5) is consistent, and the general solution is c 2 = 3^4 + 3, 
= — 9^4 — 10, with C 4 free. When c\ — 0, 


y- Vl = 6 (v 2 - Vl) - 1 9 (v 3 - Vl) + l(v 4 - Vl) 


and 

y = 13vi + 6 v 2 — 19v3 + V 4 ■ 

While the procedure in Example 1 works for arbitrary points Vi, v 2 ,..., \ p in W 1 , 
the question can be answered more directly if the chosen points V/ are a basis for M /? . 
For example, let B = {bi,..., b 77 } be such a basis. Then any y in M /? is a unique linear 
combination of bi,..., b, 7 . This combination is an affine combination of the b’s if and 
only if the weights sum to 1. (These weights are just the ^-coordinates of y, as in 
Section 4.4.) 
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4 


0 


5 


2 


" 1 " 

EXAMPLE 2 Letbi = 

0 

,b 2 = 

4 

,b 3 = 

2 

Pi = 

0 

,andp 2 = 

2 


3 


2 


4 


0 


2 


The set B = {bi, b 2 , b 3 } is a basis for M 3 . Determine whether the points and p 2 are 
affine combinations of the points in J3. 


SOLUTION Find the ^-coordinates of pj and p 2 . These two calculations can be com¬ 
bined by row reducing the matrix [ bi b 2 b 3 \) x p 2 ], with two augmented columns: 


1 _ 

0 

5 

2 

1 " 


1 

0 

0 

-2 

2 

3 

0 

4 

2 

0 

2 


0 

1 

0 

-1 

2 

3 

3 

2 

4 

0 

2 






1 

3 _ 







0 

0 

1 

2 


Read column 4 to build , and read column 5 to build p 2 : 

Pi = —2bi — b 2 + 2 b 3 and p 2 = |bi + |b 2 — |b 3 

The sum of the weights in the linear combination for pj is —1, not 1, so pj is not an 
affine combination of the b’s. However, p 2 is an affine combination of the b’s, because 
the sum of the weights for p 2 is 1. ■ 


DEFINITION 


A set S is affine if p, q e S implies that (1— t)p + tq e S for each real number t . 


Geometrically, a set is affine if whenever two points are in the set, the entire line 
through these points is in the set. (If S contains only one point, p, then the line through 
p and p is just a point, a “degenerate” line.) Algebraically, for a set S to be affine, 
the definition requires that every affine combination of two points of S belong to S. 
Remarkably, this is equivalent to requiring that S contain every affine combination of 
an arbitrary number of points of S . 


THEOREM 2 


A set S is affine if and only if every affine combination of points of S lies in S . 
That is, S is affine if and only if S = aff S . 


Remark: See the remark prior to Theorem 5 in Chapter 3 regarding mathematical indu¬ 
ction . 

PROOF Suppose that S is affine and use induction on the number m of points of S 
occurring in an affine combination. When m is 1 or 2, an affine combination of m points 
of S lies in S , by the definition of an affine set. Now, assume that every affine combina¬ 
tion of k or fewer points of S yields a point in S , and consider a combination of k + 1 
points. Take v z - in S for i = 1 + 1, and let y = c\\\ + • • • + c^k + Gc+iV^+i, 

where C\ + • • • + Ck +1 = 1. Since the C\ ’s sum to 1, at least one of them must not be 
equal to 1. By reindexing the v z and c z , if necessary, we may assume that Ck+\ ^ 1. Let 
t = C\ + • • • + Ck. Then t — 1 — Ck+\ ^ 0, and 

(C i Ck \ 

y = (1 -cjt+i) I —vi 4- \-—\ k j + Ck+i\k+i (6) 

^ t 1 / ' 

By the induction hypothesis, the point z = (c\/t)v\ + • • • + (ck/t)\k is in S', since the 
coefficients sum to 1. Thus (6) displays y as an affine combination of two points in S , 
and so y e S. By the principle of induction, every affine combination of such points 
lies in S . That is, aff S C S . But the reverse inclusion, S C aff S, always applies. Thus, 
when S is affine, S = aff S. Conversely, if S = aff S, then affine combinations of two 
(or more) points of S lie in S, so S is affine. ■ 
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DEFINITION 


THEOREM 3 


The next definition provides terminology for affine sets that emphasizes their close 
connection with subspaces of M /? . 


A translate of a set S in W 1 by a vector p is the set S + p = {s + p : s E S}. 2 A flat 
in M /? is a translate of a subspace of M /? . Two flats are parallel if one is a translate of 
the other. The dimension of a flat is the dimension of the corresponding parallel 
subspace. The dimension of a set S, written as dim S , is the dimension of the 
smallest flat containing S . A line in W 1 is a flat of dimension 1. A hyperplane in 
W 1 is a flat of dimension n — 1. 


In M 3 , the proper subspaces 3 consist of the origin 0, the set of all lines through 0, and 
the set of all planes through 0. Thus the proper flats in M 3 are points (zero-dimensional), 
lines (one-dimensional), and planes (two-dimensional), which may or may not pass 
through the origin. 

The next theorem shows that these geometric descriptions of lines and planes in M 3 
(as translates of subspaces) actually coincide with their earlier algebraic descriptions as 
sets of all affine combinations of two or three points, respectively. 


A nonempty set S is affine if and only if it is a flat. 


Remark: Notice the key role that definitions play in this proof. For example, the first 
part assumes that S is affine and seeks to show that S is a flat. By definition, a flat is a 
translate of a subspace. By choosing p in S and defining W = S + (—p), the set S is 
translated to the origin and S = W + p. It remains to show that W is a subspace, for 
then S will be a translate of a subspace and hence a flat. 

PROOF Suppose that S is affine. Let p be any fixed point in S and let W = S + (—p), 
so that S = W + p. To show that S is a flat, it suffices to show that W is a subspace of 
M /? . Since p is in S , the zero vector is in W . To show that W is closed under sums and 
scalar multiples, it suffices to show that if Ui and 112 are elements of IF, then ui + £112 
is in W for every real t . Since Ui and 112 are in W , there exist S\ and S 2 in S such that 
Ui = Si — p and U 2 = S 2 — p. So, for each real t , 

Ui + tu 2 = (si - p) + t( s 2 - p) 

= (1 - Osi + t (si + S 2 - p) - p 

Let y = Si + S 2 — p. Then y is an affine combination of points in S . Since S is affine, y is 
inS (by Theorem 2). But then (1 — t)s\ + ty is also in S . So Ui + ^isin— p + S = W. 
This shows that IF is a subspace of M 7? . Thus S is a flat, because S = W + p. 

Conversely, suppose S is a flat. That is , S = W + p for some p e R /? and some 
subspace W . To show that S is affine, it suffices to show that for any pair Si and S 2 of 

points in S , the line through Si and S 2 lies in S. By definition of W, there exist Ui and 

U 2 in W such that Si = Ui + p and S 2 = 112 + p. So, for each real t , 

(1 - t )Si + ts 2 = (1 - 0( u l + p) + *(u 2 + p) 

= (1 - 0 U 1 + tu 2 + p 

Since IF is a subspace, (1 — t)u\ + tu 2 e IF and so (1 — t)s\ + ts 2 e W + p = S. 
Thus S is affine. ■ 


2 If p = 0 , then the translate is just S itself. See Figure 4 in Section 1.5. 

3 A subset A of a set B is called a proper subset of B if A 7^ B. The same condition applies to proper 
subspaces and proper flats in W l : they are not equal to R n . 
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Theorem 3 provides a geometric way to view the affine hull of a set: it is the flat that 
consists of all the affine combinations of points in the set. For instance, Figure 3 shows 
the points studied in Example 2. Although the set of all linear combinations of bi, b 2 , 
and b 3 is all of M 3 , the set of all affine combinations is only the plane through bi, b 2 , 
and b 3 . Note that p 2 (from Example 2) is in the plane through bi, b 2 , and b 3 , while 
is not in that plane. Also, see Exercise 14. 

The next example takes a fresh look at a familiar set—the set of all solutions of a 
system Ax = b. 

EXAM PLE 3 Suppose that the solutions of an equation Ax = b are all of the form 


FIGURE 3 

2 " 


4" 

x = X 3 U + p, where u = 

-3 

1 

and p = 

0 

-3 


. Recall from Section 1.5 that this set 


is parallel to the solution set of Ax = 0, which consists of all points of the form X 3 U. 
Find points Vi and V 2 such that the solution set of Ax = b is aff {vi, v 2 }. 


SOLUTION The solution set is a line through p in the direction of u, as in Figure 1. Since 
aff {vi, V 2 } is a line through Vi and V 2 , identify two points on the line x = X 3 U + p. Two 
simple choices appear when X 3 = 0 and X 3 = 1. That is, take Vi = p and V 2 = u + p, 
so that 


v 2 


u + p 


2 " 


4" 


6 

—3 

+ 

0 

— 

-3 

1 


-3 


-2 


In this case, the solution set is described as the set of all affine combinations of the form 



(1 - x 3 ) 


4" 


6 

0 

+ X 3 

-3 

-3 


-2 


Earlier, Theorem 1 displayed an important connection between affine combinations 
and linear combinations. The next theorem provides another view of affine combina¬ 
tions, which for M 2 and M 3 is closely connected to applications in computer graphics, 
discussed in the next section (and in Section 2.7). 


DEFINITION 



THEOREM 4 


A point y in W 1 is an affine combination of vi,..., \ p in W 1 if and only if the 
homogeneous form of y is in Span{vi,..., v^}. In fact, y = C\\\ + • • • + c p v p , 
with c\ + • • • + c p = 1 , if and only if y = c\\\ + • • • + c p \ p . 


PROOF A point y is in aff {vi,..., v^} if and only if there exist weights C\ ,..., c p such 


that 



This happens if and only if y is in Span {\q, v 2 ,..., v^}. 


EXAMPLE 4 



3 


1 


" 1 " 


"4" 


Letvi = 

1 

1 

,v 2 = 

1 

<N <N 

_ 1 

,v 3 = 

7 

1 

, and p = 

3 

0 

. Use Theo 


rem 4 to write p as an affine combination of Vi, V 2 , and V 3 , if possible. 
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SOLUTION Row reduce the augmented matrix for the equation 

X\V\ + X2V2 + V3V3 = p 


To simplify the arithmetic, move the fourth row of l’s to the top (equivalent to three 
row interchanges). After this, the number of arithmetic operations here is basically the 
same as the number needed for the method using Theorem 1. 


“1 

Vi V 2 v 3 p ] 


"1 

1 

1 

1" 


"1 

1 

1 

r 

3 

1 

1 

4 


0 

-2 

-2 

1 

1 

2 

7 

3 


0 

1 

6 

2 

1 

2 

1 

0 


0 

1 

0 

-1 


"1 

0 

0 

1.5 





0 

1 

0 


1 





0 

0 

1 


,5 





0 

0 

0 


0 





By Theorem 4, 1.5vi — V 2 + . 5 v 3 = p. See Figure 4, which shows the plane that con¬ 
tains Vi, V 2 , V 3 , and p (together with points on the coordinate axes). ■ 



FIGURE 4 


PRACTICE PROBLEM 


Plot the points Vi 


T 

0 

, v 2 = 

"-1" 

2 

,v 3 = 

"3" 

1 

, and p = 

"4" 

3 


on graph paper, and 


explain why p must be an affine combination of Vi, V2, and V3. Then find the affine 
combination for p. [Hint: What is the dimension of aff {vi, V2, V3}?] 


8.1 EXERCISES 

In Exercises 1-4, write y as an affine combination of the other 
points listed, if possible. 



1 


-2 


0 


3 


5 

Vi = 

2 

, v 2 = 

2 

, v 3 = 

4 

, v 4 = 

7 

>y = 

3 


1 


-1 


3 


5 

1 

, V 2 = 

2 

, v 3 = 

2 

»y = 

7 



~-3“ 


0 


4" 


"17“ 

3. Vi = 

1 

1 

, v 2 = 

1 

<N 

_1 

, v 3 = 

-2 

6 

»y = 

1 

5 



1 


2 


4 


-3 

4. vi = 

2 

, v 2 = 

-6 

, v 3 = 

3 

»y = 

4 


0 


7 


1 


-4 


2. Vi = 
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"2" 


1 


2" 

In Exercises 5 and 6, let bi = 

1 

1 

,b 2 = 

1 

O <N 

_1 

,b 3 = 

-5 

1 


and S = {bi, b 2 ,b 3 }. Note that S is an orthogonal basis for R 3 . 
Write each of the given points as an affine combination of the 
points in the set S , if possible. [Hint: Use Theorem 5 in Section 
6.2 instead of row reduction to find the weights.] 



i 

LO 

_1 



1 

1_ 



i 

o 

1_ 

5. a. p 1 = 

1 

oo 

_1 


b. p 2 = 

1 

i_ 


c. p 3 = 

1 

oi L 

1_ 


1_ 

1 

o 


1.5" 


1 

_l 

6 . a. p 1 = 

1 

Oi <0 

1_ 

b. p 2 = 

-1.3 

-.5 

c. p 3 = 

-4 

0_ 


7. Let 



i 

i _ 


1 

<N 

1 _ 


i 

i _ 


0 


-1 


2 

Vl = 

3 

, V 2 = 

0 

, v 3 = 

1 


i 

o 

_ i 


i 

_ i 


i 

_ i 



i 

_1 


-9" 


1 

^1- 
1_ 


-3 


10 


2 

Pi = 

5 

> P2 = 

9 

> p 3 = 

00 


i 

UJ 

1_ 


i 

OJ 

1_ 


_ 5 _ 


and S = {vi,v 2 ,v 3 }. It can be shown that S is linearly 
independent. 

a. Is in Span SI Is in aff S? 

b. Is p 2 in Span SI Is p 2 in aff *S? 

c. Is p 3 in Span S ? Is p 3 in aff S ? 

8 . Repeat Exercise 7 when 



1" 


2" 


3" 


0 


1 


0 

Vl = 

3 

, v 2 = 

6 

, v 3 = 

12 


_ —2 _ 


_ —5 _ 


-6 


4" 


"-5" 




-1 


3 



Pi = 

15 

> P2 = 

-8 

, and p 3 = 


-7 


6 






Suppose that the solutions of an equation Ax = b are all of 


the form x = x 3 u + p, where u = 


4 

-2 


and p = 


-3 

0 


Find points Vi and v 2 such that the solution set of Ax = b is 

aff {v,, v 2 }. 


10. Suppose that the solutions of an equation Ax = b are all of 



5" 


1 “ 

the form x = x 3 u + p, where u = 

1 

-2 

andp = 

-3 

4 


Find points Vi and v 2 such that the solution set of Ax = b is 


aff{vi, v 2 }. 


In Exercises 11 and 12, mark each statement True or False. Justify 
each answer. 


11. a. The set of all affine combinations of points in a set S is 

called the affine hull of S . 

b. If {bi,..., bfc} is a linearly independent subset of R 77 and 
if p is a linear combination of bi,..., b^, then p is an 
affine combination of bi,..., b^ . 

c. The affine hull of two distinct points is called a line. 

d. A flat is a subspace. 

e. A plane in R 3 is a hyperplane. 

12. a. If S = {x}, then aff S is the empty set. 

b. A set is affine if and only if it contains its affine hull. 

c. A flat of dimension 1 is called a line. 

d. A flat of dimension 2 is called a hyperplane. 

e. A flat through the origin is a subspace. 

13. Suppose {vi,v 2 ,v 3 } is a basis for R 3 . Show that 
Span{v 2 — Vi,v 3 — Vi} is a plane in R 3 . [Hint: What can 
you say about u and v when Span {u, v} is a plane?] 

14. Show that if {Vi, v 2 , v 3 } is a basis for R 3 , then aff {V \, v 2 , v 3 } 
is the plane through Vi, v 2 , and v 3 . 

15. Let A be an m x n matrix and, given b in R 777 , show that the 
set S of all solutions of Ax = b is an affine subset of R 77 . 

16. Letv G R 77 and let k e R. Prove that S = {x g R 77 : x*v = k} 
is an affine subset of R 77 . 

17. Choose a set S of three points such that aff S is the plane in 
R 3 whose equation is x 3 = 5. Justify your work. 

18. Choose a set S of four distinct points in R 3 such that aff S is 
the plane 2x\ + x 2 — 3x 3 = 12. Justify your work. 

19. Let S be an affine subset of R 77 , suppose /: R 77 -> R 777 is a 
linear transformation, and let f(S) denote the set of images 
{/(x) : x G S}. Prove that f(S) is an affine subset of R 777 . 

20. Let /: R 77 -> R 777 be a linear transformation, let T be an 
affine subset of R 777 , and let S = {x G R 77 : /(x) G T }. Show 
that S is an affine subset of R 77 . 

In Exercises 21-26, prove the given statement about subsets A 
and B of R 77 , or provide the required example in R 2 . A proof 
for an exercise may use results from earlier exercises (as well as 
theorems already available in the text). 

21. If A C B and B is affine, then aff A C B . 

22. If A C B , then aff A C aff B . 

23. [(aff A) U (aff B)] C aff (A U B). [Hint: To show that 
DUE C F , show that D C F and E C F .] 

24. Find an example in R 2 to show that equality need not hold in 
the statement of Exercise 23. [Hint: Consider sets A and B , 
each of which contains only one or two points.] 

25. aff (A n B) C (aff A n aff B). 

26. Find an example in R 2 to show that equality need not hold in 
the statement of Exercise 25. 
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SOLUTION TO PRACTICE PROBLEM 


Since the points Vi, V 2 , and V 3 are not collinear (that is, not on a single line), 
aff {vi, V 2 , V 3 } cannot be one-dimensional. Thus, aff {vi, V 2 , V 3 } must equal M 2 . To find 
the actual weights used to express p as an affine combination of Vi, V 2 , and ¥ 3 , first 
compute 




p - vi = 



To write p — Vi as a linear combination of V2 — Vi and V3 — Vi, row reduce the matrix 
having these points as columns: 



Thus p — vi = ^(v2 — vi) + 2(v 3 — vi), which shows that 


- 2 ) Vi + 5V2 + 2 v 3 = —|vi + \y 2 + 2 v 3 

This expresses p as an affine combination of Vi, V 2 , and V 3 , because the coefficients sum 
to 1 . 

Alternatively, use the method of Example 4 and row reduce: 


1 

, < 

v 2 v 3 p 

^ ^ ^ 


"1111" 
1-13 4 


■1 0 0 

0104 

1 

1 1 1 


0 2 13 


2 

_0 0 ! 2_ 


This shows that p = — |vi + |v2 + 2 v 3 . 



8.2 AFFINE INDEPENDENCE 


This section continues to explore the relation between linear concepts and affine con¬ 
cepts. Consider first a set of three vectors in M 3 , say S = {vi, V2, V3}. If S is linearly 
dependent, then one of the vectors is a linear combination of the other two vectors. What 
happens when one of the vectors is an affine combination of the others? For instance, 
suppose that 


v 3 = (1 - t)y 1 + ty 2 , 


for some t in R. 


Then 


(1 t)\ 1 + t\ 2 ~ V 3 = 0 . 


This is a linear dependence relation because not all the weights are zero. But more is 
true—the weights in the dependence relation sum to 0 : 

(1 — t) + t + (— 1 ) = 0 . 

This is the additional property needed to define affine dependence . 


DEFINITION An indexed set of points {vi,..., v^} in W 1 is affinely dependent if there exist 

real numbers C\,... ,c p , not all zero, such that 

c\ 4 - V c p — 0 and c\\ 1 H-b c p \ p = 0 ( 1 ) 

Otherwise, the set is affinely independent. 
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An affine combination is a special type of linear combination, and affine depen¬ 
dence is a restricted type of linear dependence. Thus, each affinely dependent set is 
automatically linearly dependent. 

A set {vi} of only one point (even the zero vector) must be affinely independent 
because the required properties of the coefficients q cannot be satisfied when there is 
only one coefficient. For {vi}, the first equation in (1) is just c\ — 0, and yet at least one 
(the only one) coefficient must be nonzero. 

Exercise 13 asks you to show that an indexed set {vi, V 2 } is affinely dependent if 
and only if Vi = \ 2 - The following theorem handles the general case and shows how 
the concept of affine dependence is analogous to that of linear dependence. Parts (c) and 
(d) give useful methods for determining whether a set is affinely dependent. Recall from 
Section 8.1 that if v is in M 77 , then the vector v in M 77+1 denotes the homogeneous form 
of v. 


THEOREM 5 


Given an indexed set S = {vi,..., v^} in M 77 , with p > 2, the following state¬ 
ments are logically equivalent. That is, either they are all true statements or they 
are all false. 


a. S is affinely dependent. 

b. One of the points in S is an affine combination of the other points in S . 

c. The set {V 2 — vi,..., — vi} in R 77 is linearly dependent. 

d. The set {vi,..., v^} of homogeneous forms in W 1+1 is linearly dependent 


PROOF Suppose statement (a) is true, and let c \,..., c p satisfy (1). By renaming the 
points if necessary, one may assume that c 1 ^ 0 and divide both equations in (1) by C \, 
so that 1 + (C 2 /C 1 ) + • • • + ( c p /c\ ) = 0 and 

Vi = (-c 2 /ci)v 2 +-b (~c p /ci)\p (2) 

Note that the coefficients on the right side of (2) sum to 1. Thus (a) implies (b). Now, 
suppose that (b) is true. By renaming the points if necessary, one may assume that 
VI = c 2 \2 H-F C p x p , where c 2 H- V c p = 1. Then 

(c 2 H-F c p )\ 1 = c 2 x 2 H-F c p \p (3) 


and 

c 2 (\ 2 - vi) H-F Cpiyp - vO = 0 



Not all of C 2 ,..., c p can be zero because they sum to 1. So (b) implies (c). 

Next, if (c) is true, then there exist weights C 2 , ..., c p , not all zero, such that (4) 
holds. Rewrite (4) as (3) and set c\ — —(c 2 + • • • + c p ). Then c\ + • • • + c p = 0. Thus 
(3) shows that (1) is true. So (c) implies (a), which proves that (a), (b), and (c) are 
logically equivalent. Finally, (d) is equivalent to (a) because the two equations in (1) 
are equivalent to the following equation involving the homogeneous forms of the points 
in S : 



In statement (c) of Theorem 5, Vi could be replaced by any of the other points in 
the list vi,..., x p . Only the notation in the proof would change. So, to test whether a 
set is affinely dependent, subtract one point in the set from the other points, and check 
whether the translated set of p — 1 points is linearly dependent. 
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EXAMPLE 1 The affine hull of two distinct points p and q is a line. If a third point 
r is on the line, then {p, q, r} is an affinely dependent set. If a point s is not on the line 
through p and q, then these three points are not collinear and {p, q, s} is an affinely 
independent set. See Figure 1. ■ 


q 


afffp, q} 


P 


FIGURE 1 {p, q, r} is affinely dependent. 


EXAMPLE 2 Let vi = 

" 1 " 

3 

, v 2 = 

" 2 " 

7 

, v 3 = 

0 

4 


7 


6.5 


7 


Determine whether S is affinely independent. 


, and S 


= {Vl, v 2 ,v 3 }. 



1 


~-l" 

SOLUTION ComputeV 2 —V 1 = 

4 

-.5 

and V 3 — vi = 

1 

0 


. These two points are 


not multiples and hence form a linearly independent set, S '. So all statements in Theorem 

5 are false, and S is affinely independent. Figure 2 shows S and the translated set S '. 
Notice that Span S' is a plane through the origin and aff S is a parallel plane through vi, 
V 2 , and V 3 . (Only a portion of each plane is shown here, of course.) ■ 




FIGURE 2 An affinely independent set 

{vi, v 2 , v 3 }. 



" 1 " 


2 


0 


0 

EXAMPLE 3 Letvi = 

3 

7 

,v 2 = 

1 

r- 

so 

_ 1 

,V 3 = 

4 

7 

,andv 4 = 

14 

6 


S = {vi,..., V4}. Is S affinely dependent? 


, and let 



r 


■-1" 


"-I" 

SOLUTION Compute V2 — vi = 

4 

-.5 

, v 3 - Vi = 

1 

0 

, and V4 — vi = 

11 

-1 


and row reduce the matrix: 


1 

-1 

- 1 " 


"1 

-1 

- 1 " 


"1 

-1 

- 1 " 

4 

1 

11 


0 

5 

15 


0 

5 

15 

-.5 

0 

-1 


0 

-.5 

-1.5 


0 

0 

0 


Recall from Section 4.6 (or Section 2.8) that the columns are linearly dependent be¬ 
cause not every column is a pivot column; so V 2 — Vi, V 3 — Vi, and V 4 — Vi are linearly 
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THEOREM 6 


DEFINITION 


dependent. By statement (c) in Theorem 5 , {vi, v 2 , V3, V4} is affinely dependent. This 
dependence can also be established using (d) in Theorem 5 instead of (c). ■ 

The calculations in Example 3 show that V 4 — Vi is a linear combination of v 2 — Vi 
and V 3 — vi, which means that V 4 — Vi is in Span {V 2 — Vi, V 3 — Vi}. By Theorem 1 in 
Section 8.1, V4 is in aff{vi,v 2 , V3}. In fact, complete row reduction of the matrix in 
Example 3 would show that 

v 4 - Vi = 2 (v 2 - Vi) + 3 (v 3 - Vi) ( 5 ) 

V4 = — 4 vi + 2v 2 + 3 v3 (6) 

See Figure 3. 



FIGURE 3 y 4 is in the plane aff {vi, v 2 , v 3 }. 


Figure 3 shows grids on both Span{v 2 — vi, V 3 — Vi } and aff {vi, v 2 , V 3 }. The grid 
on aff {vi, v 2 , V 3 } is based on (5). Another “coordinate system” can be based on (6), in 
which the coefficients —4, 2, and 3 are called affine or barycentric coordinates of V 4 . 


Barycentric Coordinates 

The definition of barycentric coordinates depends on the following affine version of the 
Unique Representation Theorem in Section 4.4. See Exercise 17 in this section for the 
proof. 


Let S = {vi,..., Vfc} be an affinely independent set in M /? . Then each p in aff S 
has a unique representation as an affine combination of Vi,..., v^. That is, for 
each p there exists a unique set of scalars C\ ,. .., c^ such that 

p = C 1 V 1 H-b c k \ k and c\ 4-b c k = 1 (7) 


Let S = {vi,..., Vfc} be an affinely independent set. Then for each point p in 
aff S, the coefficients c \,..., c^ in the unique representation (7) of p are called 
the barycentric (or, sometimes, affine) coordinates of p. 


Observe that (7) is equivalent to the single equation 


P 

1 


Cl 


Vi 

1 


H-b c h 


Vk 

1 


( 8 ) 


involving the homogeneous forms of the points. Row reduction of the augmented matrix 
vi ••• \k p] for (8) produces the barycentric coordinates of p. 
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EXAMPLE 4 Let a 


T 

7 

,b = 

"3" 

0 

,c = 

"9" 

3 

, and p = 

"5" 

3 


. Find the barycen- 


tric coordinates of p determined by the affinely independent set {a, b, c}. 

SOLUTION Row reduce the augmented matrix of points in homogeneous form, moving 
the last row of ones to the top to simplify the arithmetic: 


[ 


a 


b 


p] 


"1 

3 

9 

_ 1 


"1 

1 

1 

1 

7 

0 

3 

3 


1 

3 

9 

5 

1 

1 

1 

1 


7 

0 

3 

1 _ 






1 

0 

0 

1 

4 






0 

1 

0 

1 

3 






0 

0 

1 

5 

12 


The coordinates are ||, and ^, so p 


|a + |b + j|c. 


B ary centric coordinates have both physical and geometric interpretations. They 
were originally defined by A. F. Moebius in 1827 for a point p inside a triangular 
region with vertices a, b, and c. He wrote that the barycentric coordinates of p are 
three nonnegative numbers m a , m^ and m c such that p is the center of mass of a system 
consisting of the triangle (with no mass) and masses m a , mb, and m c at the corresponding 
vertices. The masses are uniquely determined by requiring that their sum be 1. This view 
is still useful in physics today. 1 

Figure 4 gives a geometric interpretation to the barycentric coordinates in Example 
4, showing the triangle Aabc and three small triangles Apbc, Aapc, and Aabp. The 
areas of the small triangles are proportional to the barycentric coordinates of p. In fact, 


area(Apbc) = 
area (A ape) = 
area(Aabp) = 


- • area(Aabc) 
^ • area(Aabc) 


5 


12 


area(Aabc) 



a 



The formulas in Figure 4 are verified in Exercises 21-23. Analogous equalities for 
volumes of tetrahedrons hold for the case when p is a point inside a tetrahedron in R 3 , 
with vertices a, b, c, and d. 


1 See Exercise 29 in Section 1.3. In astronomy, however, “barycentric coordinates” usually refer to ordinary 
R 3 coordinates of points in what is now called the International Celestial Reference System , a Cartesian 
coordinate system for outer space, with the origin at the center of mass (the bary center) of the solar system. 
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When a point is not inside the triangle (or tetrahedron), some of the barycentric 
coordinates will be negative. The case of a triangle is illustrated in Figure 5, for vertices 
a, b, c, and coordinate values r, s, t, as above. The points on the line through b and c, for 
instance, have r = 0 because they are affine combinations of only b and c. The parallel 
line through a identifies points with r = 1 . 



FIGURE 5 Barycentric coordinates 
for points in aff {a, b, c}. 


Barycentric Coordinates in Computer Graphics 

When working with geometric objects in a computer graphics program, a designer may 
use a “wire-frame” approximation to an object at certain key points in the process 
of creating a realistic final image . 2 For instance, if the surface of part of an object 
consists of small flat triangular surfaces, then a graphics program can easily add color, 
lighting, and shading to each small surface when that information is known only at the 
vertices. Barycentric coordinates provide the tool for smoothly interpolating the vertex 
information over the interior of a triangle. The interpolation at a point is simply the 
linear combination of the vertex values using the barycentric coordinates as weights. 

Colors on a computer screen are often described by RGB coordinates. A triple 
(r, g, b ) indicates the amount of each color—red, green, and blue—with the parameters 
varying from 0 to 1. For example, pure red is (1,0, 0), white is (1,1,1), and black is 
(0,0,0). 


EXAMPLE 5 



"3" 


4 


"1" 


3 


Let vi = 

1 

5 

, v 2 = 

1 

CO ^ 

_1 

, v 3 = 

5 

1 

, and p = 

1 

_ un 
co • 

CO 

_1 

. The col 


ors at the vertices Vi, V 2 , and V 3 of a triangle are magenta (1,0,1), light magenta (1, .4,1), 
and purple (. 6 , 0, 1), respectively. Find the interpolated color at p. See Figure 6 . 



FIGURE 6 Interpolated colors. 


2 The Introductory Example for Chapter 2 shows a wire-frame model of a Boeing 777 airplane, used to 
visualize the flow of air over the surface of the plane. 
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SOLUTION First, find the barycentric coordinates of p. Here is the calculation using 
homogeneous forms of the points, with the first step moving row 4 to row 1: 



v 2 



p] ~ 


"1 

1 

1 

1 


"1 

0 

0 

.25 

3 

4 

1 

3 


0 

1 

0 

.50 

1 

3 

5 

3 


0 

0 

1 

.25 

5 

4 

1 

3.5 


0 

0 

0 

0 


So p = .25vi + .5v 2 + .25v 3. Use the barycentric coordinates of p to make a linear 
combination of the color data. The RGB values for p are 



1 


1 


.6 


.9 

0 

+ .50 

.4 

+ .25 

0 

— 

.2 

1 


1 


1 


1 


red 

green ■ 

blue 


One of the last steps in preparing a graphics scene for display on a computer screen 
is to remove “hidden surfaces” that should not be visible on the screen. Imagine the 
viewing screen as consisting of, say, a million pixels, and consider a ray or “line of sight” 
from the viewer’s eye through a pixel and into the collection of objects that make up the 
3D display. The color and other information displayed in the pixel on the screen should 
come from the object that the ray first intersects. See Figure 7. When the objects in 
the graphics scene are approximated by wire frames with triangular patches, the hidden 
surface problem can be solved using barycentric coordinates. 



FIGURE 7 A ray from the eye through the screen to the 
nearest object. 


The mathematics for finding the ray-triangle intersections can also be used to per¬ 
form extremely realistic shading of objects. Currently, this ray-tracing method is too 
slow for real-time rendering, but recent advances in hardware implementation may 
change that in the future . 3 


EXAMPLE 6 Let 



1 " 


8 


5 


0 " 


.7 

Vi = 

1 

-6 

, v 2 = 

1 

-4 

, v 3 = 

11 

-2 

, a = 

0 

10 

, b = 

1 

1 _ 


and x(t) = a + tb for t > 0. Find the point where the ray x(t) intersects the plane that 
contains the triangle with vertices Vi, V 2 , and V 3 . Is this point inside the triangle? 


3 See Joshua Fender and Jonathan Rose, “A High-Speed Ray Tracing Engine Built on a Field-Programmable 
System,” in Proc. Int. Confon Field-Programmable Technology, IEEE (2003). (A single processor can 
calculate 600 million ray-triangle intersections per second.) 
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SOLUTION The plane is aff {vi, V 2 , V 3 }. A typical point in this plane may be written 
as (1 — C 2 — C 3 )vi + C 2 V 2 + C 3 V 3 for some C 2 and C 3 . (The weights in this combination 
sum to 1.) The ray x(t) intersects the plane when C 2 , C 3 , and t satisfy 


(1 - c 2 - c 3 )vi + c 2 V 2 + C 3 V 3 = a + tb 
Rearrange this as £ 2(^2 — Vi) + C 3 (v 3 — Vi) + t (— b) = a — Vi . In matrix form, 




= a — vi 


For the specific points given here, 



1 

r- 

1 _ 


1 

1_ 


1 — 

1 _ 

v 2 - Vl = 

1 

O <N 

_1 

, v 3 - Vl = 

10 

4 

, a-vi = 

1 

On ^ 

1_ 


Row reduction of the augmented matrix above produces 


7 

4 

-.7 

- 1 " 


" 1 

0 

0 

.3 

0 

10 

-.4 

-1 


0 

1 

0 

.1 

2 

4 

3 

16 


0 

0 

1 

5 


Thus C 2 = .3, C 3 = .1, and t = 5. Therefore, the intersection point is 



0 


.7 


3.5 

x(5) = a + 5b = 

0 

10 

+ 5 

1 

4^ 

1 _ 

— 

- 1 

0 0 
<N lF 

_ 1 


Also, 


x(5) = (1 - .3 - .l)vi + .3v 2 + .IV 3 



1 


8 


5 


3.5 

6 

1 

+ .3 

1 

+ .1 

11 

— 

2.0 


-6 


-4 


-2 


-5.0 


The intersection point is inside the triangle because the barycentric weights for x(5) are 
all positive. ■ 


PRACTICE PROBLEMS 


1. Describe a fast way to determine when three points are collinear. 


2. The points Vi = 

pendent set. Find weights c\ ,..., c\ that produce an affine dependence relation 
c\\\ + • • • + C 4 V 4 = 0, where c\ + • • • + C 4 = 0 and not all q are zero. [Hint: See 
the end of the proof of Theorem 5.] 


, v 2 


1 

0 


, v 3 


5 

4 


, and V 4 


1 

2 


form an affinely de- 
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8.2 EXERCISES 


In Exercises 1-6, determine if the set of points is affinely depen¬ 
dent. (See Practice Problem 2.) If so, construct an affine depen¬ 
dence relation for the points. 






1 


1 

<N 

1 _ 


1 

<N 

1 _ 


1 

0 

1 _ 

3. 

2 

5 

-4 

5 

-1 

5 

15 


-1 


1 

00 

1 _ 


11 


1 

cp 

_ 1 


1 

<N 

1 _ 


1 

0 

1 _ 


1 


1 

<N 

1 _ 

4. 

5 

5 

-3 

5 

-2 

5 

7 


1 

LO 

1_ 


7 


-6 


1 

cn 

_l 



1 



1 

O 

1_ 


-1 




1 

0 

5. 

0 

5 

1 

5 

5 



5 


1 

_1 



1 


1 




1 

OJ 

1_ 


1 


—1 

0 
_1 


1 

_1 



_1 


6. 

3 

5 

-1 

5 

5 

5 


5 



1 


1 

<N 

_1 


1 

<N 

_1 


1— 
0 

1_ 



In Exercises 7 and 8, find the barycentric coordinates of p with 
respect to the affinely independent set of points that precedes it. 



b. If Vi, v 2 , v 3 , and v 4 are in E 3 and if the set 
{y 2 — Vi, V3 — Vi, V4 — Vi} is linearly independent, then 
{vi,. .., v 4 } is affinely independent. 

c. Given S = { bi,...,b^} in E", each p in affS has 
a unique representation as an affine combination of 
hi,..., b^. 

d. When color information is specified at each vertex y 1 , v 2 , 
v 3 of a triangle in E 3 , then the color may be interpolated 
at a point p in aff {vi, v 2 , v 3 } using the barycentric coor¬ 
dinates of p. 

e. If T is a triangle in E 2 and if a point p is on an edge of 
the triangle, then the barycentric coordinates of p (for this 
triangle) are not all positive. 

11. Explain why any set of five or more points in E 3 must be 
affinely dependent. 



Show that a set {vi,. 
p > n + 2. 


.., y p } in E n is affinely dependent when 


13. Use only the definition of affine dependence to show that an 
indexed set {vi, v 2 } in E" is affinely dependent if and only if 
Vl = V2- 


14. The conditions for affine dependence are stronger than those 
for linear dependence, so an affinely dependent set is auto¬ 
matically linearly dependent. Also, a linearly independent set 
cannot be affinely dependent and therefore must be affinely 
independent. Construct two linearly dependent indexed sets 
Si and S 2 in E 2 such that S 1 is affinely dependent and S 2 
is affinely independent. In each case, the set should contain 
either one, two, or three nonzero points. 


In Exercises 9 and 10, mark each statement True or False. Justify 
each answer. 





and let S = 


9. a. 

b. 





10 . a. 


If Vi,..., v p are in E" and if the set {vi — v 2 , v 3 — v 2 ,..., 
v P — ^ 2 } is linearly dependent, then {vi,...,Vp} is 
affinely dependent. (Read this carefully.) 

If vi,..., \ p are in E n and if the set of homogeneous 
forms {vi,..., \ p ) in E /7+1 is linearly independent, then 
{vi ,...,\ p } is affinely dependent. 

A finite set of points {vi,..., v^} is affinely dependent if 
there exist real numbers c\ ,..., c k , not all zero, such that 
ci -I-f- c k = 1 and d\\ H-b c k \ k = 0. 

If S = {v^ ..., Vp} is an affinely independent set in E 77 
and if p in E 77 has a negative barycentric coordinate 
determined by S , then p is not in aff S . 

If v 1? v 2 , v 3 , a, and b are in E 3 and if a ray a + tb for 
t > 0 intersects the triangle with vertices v ls v 2 , and v 3 , 
then the barycentric coordinates of the intersection point 
are all nonnegative. 

If {\i ,..., \ p } is an affinely dependent set in E”, then the 
set {?!,..., y p } in E /7+1 of homogeneous forms may be 
linearly independent. 


{vi, v 2 ,v 3 }. 

a. Show that the set S is affinely independent. 

b. Find the barycentric coordinates of = 


P 2 = 

" 1 " 

2 

> P 3 = 

~- 2 " 

1 

» P 4 = 

1 " 
-1 

, and p 5 = 

" 1 " 

1 J’ 


with respect to S . 


c. Let T be the triangle with vertices v 1? v 2 , and v 3 . When 
the sides of T are extended, the lines divide E 2 into seven 
regions. See Figure 8. Note the signs of the barycentric 
coordinates of the points in each region. For example, p 5 
is inside the triangle T and all its barycentric coordinates 
are positive. Point p 2 has coordinates (—, +, +). Its third 
coordinate is positive because p 1 is on the v 3 side of the 
line through Vi and v 2 . Its first coordinate is negative 
because p x is opposite the Vi side of the line through v 2 
and v 3 . Point p 2 is on the v 2 v 3 edge of T . Its coordinates 
are (0, + , +). Without calculating the actual values, de¬ 
termine the signs of the barycentric coordinates of points 
p 6 , p 7 , and p 8 as shown in Figure 8. 
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FIGURE 8 



16. Let 








, an dS = {vi,v 2 ,v 3 }. 



a. Show that the set S is affinely independent. 


b. Find the barycentric coordinates of p 1? p 2 , and p 3 with 
respect to S . 


c. On graph paper, sketch the triangle T with vertices Vi, 
v 2 , and v 3 , extend the sides as in Figure 8, and plot the 
points p 4 , p 5 , p 6 , and p 7 . Without calculating the actual 
values, determine the signs of the barycentric coordinates 
of points p 4 , p 5 , p 6 , and p 7 . 


17. Prove Theorem 6 for an affinely independent set 
S = {vi,..., v^} in R n . [Hint: One method is to mimic the 
proof of Theorem 7 in Section 4.4.] 


18. Let T be a tetrahedron in “standard” position, with three 
edges along the three positive coordinate axes in R 3 , 
and suppose the vertices are ae i, &e 2 , ce 3 , and 0 , where 
[ ei e 2 e 3 ] = / 3 . Find formulas for the barycentric coor¬ 
dinates of an arbitrary point p in R 3 . 

19. Let {Pi, p 2 , p 3 } be an affinely dependent set of points in R" 
and let /: R” -> R m be a linear transformation. Show that 
{ /(p i), /(p 2 ), /(p 3 )} is affinely dependent in R m . 

20. Suppose that {Pi, p 2 , p 3 } is an affinely independent set in R" 
and q is an arbitrary point in R n . Show that the translated set 
{Pi + <L P 2 + P 3 + 0} is also affinely independent. 


In Exercises 21-24, a, b, and c are noncollinear points in R 2 and 
p is any other point in R 2 . Let Aabc denote the closed triangular 
region determined by a, b, and c, and let Apbc be the region 
determined by p, b, and c. For convenience, assume that a, b, and 
c are arranged so that det [a b c ] is positive, where a, b, and 
c are the standard homogeneous forms for the points. 

21. Show that the area of Aabc is det [a b c ]/2. [Hint: Con¬ 
sult Sections 3.2 and 3.3, including the Exercises.] 

22. Let p be a point on the line through a and b. Show that 
det [ a b p ] = 0. 

23. Let p be any point in the interior of Aabc, with barycentric 
coordinates (r, s, t ), so that 

[a b c] 

Use Exercise 21 and a fact about determinants (Chapter 3) to 
show that 

r = (area of Apbc)/(area of Aabc) 
s = (area of Aapc)/(area of Aabc) 
t = (area of Aabp)/(area of Aabc) 



24. Take q on the line segment from b to c and consider the line 
through q and a, which may be written asp = (1 — x)q + xa 
for all real x. Show that, for each x, det[p b c] = 

x • det [a b c ]. From this and earlier work, conclude that 
the parameter x is the first barycentric coordinate of p. How¬ 
ever, by construction, the parameter x also determines the 
relative distance between p and q along the segment from 
q to a. (When x = 1, p = a.) When this fact is applied to 
Example 5, it shows that the colors at vertex a and the point q 
are smoothly interpolated as p moves along the line between 
a and q. 
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7 


3 


0 

25. Let Vi = 

1 

u> 

1_ 

, V 2 = 

1 

Lh LO 

1_ 

, V 3 = 

1 

On <N 

_1 

, a = 

1 

0 

_ 1 



, and x(t) = a + ^b for t >0. Find the point 


where the ray x(t) intersects the plane that contains the 
triangle with vertices Vi, v 2 , and v 3 . Is this point inside the 
triangle? 



1 


8" 

26. Repeat Exercise 25 with Vi = 

1 

(N ^ 

_1 

, v 2 = 

2 

-5 



3 


0 


.9“ 

v 3 = 

1 

O <N 

_1 

, a = 

1 

0 00 

_ 1 

, and b = 

2.0 

-3.7 
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SOLUTIONS TO PRACTICE PROBLEMS 

1. From Example 1, the problem is to determine if the points are affinely dependent. 
Use the method of Example 2 and subtract one point from the other two. If one of 
these two new points is a multiple of the other, the original three points lie on a line. 

2. The proof of Theorem 5 essentially points out that an affine dependence relation 
among points corresponds to a linear dependence relation among the homogeneous 


forms of the points, using the same weights. 

So, row reduce: 





"4 

1 

5 

r 


"1 

1 

1 

1 

1 - 1 

cn 

<N 

H 

1 _ 1 

1 

0 

4 

2 


4 

1 

5 

1 

1 

1 

1 

1 


1 

0 

4 

2 


"1 

0 

0 


1 






0 

1 

0 

1.25 






0 

0 

1 

.75 






View this matrix as the coefficient matrix for Ax = 0 with four variables. Then X 4 
is free, X\ = X 4 , X 2 = —1.25x4, and X 3 = —. 75 x 4 . One solution is X\ = X 4 = 4, 
X 2 = —5, and X 3 = —3. A linear dependence among the homogeneous forms is 
4vi — 5v2 — 3v 3 + 4v 4 = 0 . So 4vi — 5 v 2 — 3 v 3 + 4v 4 = 0 . 

Another solution method is to translate the problem to the origin by subtracting 
Vi from the other points, find a linear dependence relation among the translated 
points, and then rearrange the terms. The amount of arithmetic involved is about 
the same as in the approach shown above. 


8.3 CONVEX COMBINATIONS 


Section 8.1 considered special linear combinations of the form 

c\\\ + c 2 \ 2 H-E c k \ k , where c\ + c 2 H-E c k = 1 

This section further restricts the weights to be nonnegative. 


DEFINITION 


A convex combination of points Vi, V 2 ,..., v^ in M 7? is a linear combination of 
the form 


C\V\ + c 2 \ 2 4-E C k \ k 


such that c\ + c 2 + • • • + c k = 1 and c\ > 0 for all i . The set of all convex 
combinations of points in a set S is called the convex hull of S , denoted by conv S . 


The convex hull of a single point Vi is just the set {vi}, the same as the affine hull. 
In other cases, the convex hull is properly contained in the affine hull. Recall that the 
affine hull of distinct points Vi and V 2 is the line 


y = (1 -f)vi + t\ 2 . 


with t in M 


Because the weights in a convex combination are nonnegative, the points in conv {vi, V 2 } 
may be written as 

y = (1 — t)\ 1 + t\ 2 , with 0 < t < 1 


which is the line segment between vi and V 2 , hereafter denoted by V 1 V 2 . 

If a set S is affinely independent and if p € aff S , then p e conv S if and only if 
the barycentric coordinates of p are nonnegative. Example 1 shows a special situation 
in which S is much more than just affinely independent. 
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EXAMPLE 1 Let 
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i 

o\ 

_i 


3 


i 

o 

i_ 


i— 

o 

i_ 


0 


3 


6 


3 


5 

Vi = 

6 

, v 2 = 

3 

, v 3 = 

0 

. Pi = 

3 

. p 2 = 

11 


-3 


0 


3 


0 


-4 


and S = {vi,V 2 ,V 3 }. Note that S is an orthogonal set. Determine whether is in 
Span S , aff S , and conv S. Then do the same for p 2 . 

SOLUTION If p x is at least a linear combination of the points in S , then the weights are 
easily found, because S is an orthogonal set. Let W be the subspace spanned by S . A cal¬ 
culation as in Section 6.3 shows that the orthogonal projection of onto W is itself: 

Pl-Vl p r v 2 P!-V 3 

P ro JtvPi = - V 1 H- y 2 H-v 3 

Vl-Vl v 2 • v 2 v 3 -v 3 
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1 

_1 


3 


0 


1 

0 

1 

3 

1 

6 


3 


3 

6 

+ 3 

3 

+ 3 

0 

— 

3 

= Pi 


i 

1_ 


0 


3 


0 



This shows that is in Span S'. Also, since the coefficients sum to 1, p x is in aff S'. In 
fact, p { is in conv S , because the coefficients are also nonnegative. 

For p 2 , a similar calculation shows that proj^ p 2 ^ p 2 . Since proj^ p 2 is the closest 
point in Span S to p 2 , the point p 2 is not in Span S . In particular, p 2 cannot be in aff S 
or conv S . ■ 


Recall that a set S is affine if it contains all lines determined by pairs of points in S. 
When attention is restricted to convex combinations, the appropriate condition involves 
line segments rather than lines. 


DEFINITION 


A set S is convex if for each p, q e S , the line segment pq is contained in S. 


Intuitively, a set S is convex if every two points in the set can “see” each other 
without the line of sight leaving the set. Figure 1 illustrates this idea. 





Not convex 


The next result is analogous to Theorem 2 for affine sets. 


THEOREM 7 


A set S is convex if and only if every convex combination of points of S lies in 
S . That is, S is convex if and only if S = conv S . 


PROOF The argument is similar to the proof of Theorem 2. The only difference is 
in the induction step. When taking a convex combination of k + 1 points, consider 
y = c iVi H-F c k Vk + Cfc+iVjt+i, where a H-F c k+x = 1 and 0 < c t < 1 for 
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THEOREM 8 


THEOREM 9 


all /. If Ck+\ = 1, then y = v^+i, which belongs to S , and there is nothing further to 
prove. If Ck +1 < 1, let t — c\ + • • • + Ck . Then t — 1 — > 0 and 

(C i \ 

y = (1 - Cjt+i)l —vi +-h —y k I + c , c+ 1 v/ c+ i (1) 

' t V ^ 

By the induction hypothesis, the point z = (<q /f)vi + • • • + (c^/ t)\k is in S , since the 
nonnegative coefficients sum to 1. Thus equation (1) displays y as a convex combination 
of two points in S. By the principle of induction, every convex combination of such 
points lies in S. ■ 

Theorem 9 below provides a more geometric characterization of the convex hull 
of a set. It requires a preliminary result on intersections of sets. Recall from Section 
4.1 (Exercise 32) that the intersection of two subspaces is itself a subspace. In fact, the 
intersection of any collection of subspaces is itself a subspace. A similar result holds for 
affine sets and convex sets. 


Let {S a : a £ *4} be any collection of convex sets. Then is convex. If 

{Tp : P e B} is any collection of affine sets, then Hp^Tp is affine. 


PROOF If p and q are in r\S a , then p and q are in each S a . Since each S a is convex, 
the line segment between p and q is in S a for all a and hence that segment is contained 
in D S a . The proof of the affine case is similar. ■ 


For any set S , the convex hull of S is the intersection of all the convex sets that 
contain S . 


PROOI Let T denote the intersection of all the convex sets containing S . Since conv S 
is a convex set containing S , it follows that T C conv S. On the other hand, let C be 
any convex set containing S . Then C contains every convex combination of points of 
C (Theorem 7), and hence also contains every convex combination of points of the 
subset S . That is, conv S C C . Since this is true for every convex set C containing S, 
it is also true for the intersection of them all. That is, conv S C T. ■ 


Theorem 9 shows that conv S is in a natural sense the “smallest” convex set con¬ 
taining S . For example, consider a set S that lies inside some large rectangle in M 2 , and 
imagine stretching a rubber band around the outside of S . As the rubber band contracts 
around S , it outlines the boundary of the convex hull of S . Or to use another analogy, 
the convex hull of S fills in all the holes in the inside of S and fills out all the dents in 
the boundary of S . 

EXAMPLE 2 


a. The convex hulls of sets S and T in M 2 are shown below. 



5 


conv S 


T 


conv T 
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FIGURE 2 


y 

y = x 2 



FIGURE 3 



b. Let S be the set consisting of the standard basis for R 3 , S = {ei, e2,63}. Then conv S 
is a triangular surface in R 3 , with vertices ei, e2, and e3. See Figure 2. 


EXAMPLE 3 Let S 


x 


y 


: x > 0 and y = x 2 } . Show that the convex hull of 


S is the union of the origin and 


x 

y 


: x > 0 and y > X 2 } . See Figure 3 


SOLUTION Every point in conv S must lie on a line segment that connects two points 
of S. The dashed line in Figure 3 indicates that, except for the origin, the positive y- 
axis is not in conv S, because the origin is the only point of S on the y-axis. It may 
seem reasonable that Figure 3 does show conv S , but how can you be sure that the point 
(10 -2 ,10 4 ), for example, is on a line segment from the origin to a point on the curve in 

SI 

Consider any point p in the shaded region of Figure 3, say 


P 


a 

b 


with a > 0 and b > 


a 


The line through 0 and p has the equation y = ( b/a)t for t real. That line intersects 
S where t satisfies ( b/a)t = t 2 , that is, when t = b/a. Thus, p is on the line segment 

bja 


from 0 to 


b 2 /a 


, which shows that Figure 3 is correct. 


The following theorem is basic in the study of convex sets. It was first proved by 
Constantin Caratheodory in 1907. If p is in the convex hull of S , then, by definition, p 
must be a convex combination of points of S. But the definition makes no stipulation 
as to how many points of S are required to make the combination. Caratheodory’s 
remarkable theorem says that in an n -dimensional space, the number of points of S 
in the convex combination never has to be more than n + 1. 


THEOREM 10 


(Caratheodory) If S is a nonempty subset of M 7? , then every point in conv S can 
be expressed as a convex combination of n + 1 or fewer points of S. 


PROOF Given p in convS, one may write p = c\\ 1 + ••• + c k \k, where v, e S, 
c\ + • • • + Ck = 1, and q > 0, for some k and i = l,... ,k. The goal is to show that 
such an expression exists for p with k < n + 1. 

If k > n + 1, then {vi,..., v^} is affinely dependent, by Exercise 12 in Section 8.2. 
Thus there exist scalars d\ ,..., , not all zero, such that 

k k 

Y j di V/ = 0 and di = 0 

i =1 i =1 

Consider the two equations 



c\\ 1 + c 2V2 H-F c k \k = P 

d\\\ + d 2 \2 H-F d k \k = 0 


By subtracting an appropriate multiple of the second equation from the first, we now 
eliminate one of the V/ terms and obtain a convex combination of fewer than k elements 
of S that is equal to p. 
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Since not all of the d\ coefficients are zero, we may assume (by reordering sub¬ 
scripts if necessary) that dk > 0 and that Ck/dk < C\ / d{ for all those i for which dj > 0 . 
For i = 1,... ,k,let bi — C[ — (ck / dk)di . Then bk = 0 and 


k k 

E* = E 




= 1 - 0 = 


1 


Furthermore, each bi >0. Indeed, if df <0, then bt > c t > 0 . If df >0, then bj — 
dj ( Ci / di — Ck / dk) > 0. By construction, 


k —1 

E bt V; = 

i = \ 

k 

= L bi V; : 
i = 1 

-'wi 

II 

* d ‘ ) 


k 

k 

k 


II 

iM 

Ci 

< 

**■*. 

I 


i = L c ' V; = p 

i = 1 


Thus p is now a convex combination of k — 1 of the points Vi,..., . This process may 

be repeated until p is expressed as a convex combination of at most n + 1 of the points 
of S. M 


The following example illustrates the calculations in the proof above. 


EXAMPLE 4 Let 


Vi = 

"1" 

0 

, V 2 = 

" 2 " 

3 

, V 3 = 

1 

in 

1 _ 

, V 4 = 

"3" 

0 

» P = 

10 

3 

5 










L 2 J 


and S = {vi, V 2 , V 3 , V 4 }. Then 

5V1 + \\ 2 + 5V3 + ^v 4 = p ( 2 ) 

Use the procedure in the proof of Caratheodory’s Theorem to express p as a convex 
combination of three points of S . 

SOLUTION The set S is affinely dependent. Use the technique of Section 8.2 to obtain 
an affine dependence relation 


—5vi + 4v 2 — 3v3 + 4v 4 = 0 (3) 

Next, choose the points v 2 and v 4 in (3), whose coefficients are positive. For each 
point, compute the ratio of the coefficients in equations (2) and (3). The ratio for v 2 
is | 4 = ^4 , and that for v 4 is 4 = ^. The ratio for v 4 is smaller, so subtract ^ 

times equation (3) from equation (2) to eliminate v 4 : 

(? + is)vi + (i - ^)v 2 + (i + ^)v 3 + (k 2 - ^)v 4 = P 

5gV! + ±v 2 + lv 3 = p ■ 

This result cannot, in general, be improved by decreasing the required number of 
points. Indeed, given any three non-collinear points in M 2 , the centroid of the triangle 
formed by them is in the convex hull of all three, but is not in the convex hull of any two. 
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PRACTICE PROBLEMS 



6 


"7" 


~-2~ 


T 


"3" 


1. Let vi = 

i 

<N <N 

_1 

, v 2 = 

1 

5 

, v 3 = 

4 

-1 

» Pi = 

3 

1 

, and p 2 = 

2 

1 

, and let 


S = {vi,y 2 , v 3 }. Determine whether and p 2 are in conv S. 


2. Let S be the set of points on the curve y — 1/x for x > 0. Explain geometrically 
why conv S consists of all points on and above the curve S . 


8.3 EXERCISES 


1. In R 2 , let S = | ® : 0 < y < lj U 

(or sketch) the convex hull of S. 


2 

0 


. Describe 


2. Describe the convex hull of the set S of points 



in M 2 


that satisfy the given conditions. Justify your answers. (Show 
that an arbitrary point p in S belongs to conv S .) 

a. y = 1/x and x > 1/2 


b . y = sinx 

c. y = x 1 / 2 and x > 0 


3. Consider the points in Exercise 5 in Section 8.1. Which of 
Pj, p 2 , and p 3 are in conv S ? 

4. Consider the points in Exercise 6 in Section 8.1. Which of 
P!, p 2 , and p 3 are in conv S ? 

5. Let 



"-I" 


0" 


1" 


1“ 

Vi = 

-3 

, v 2 = 

-3 

, V3 = 

-1 

,v 4 = 

1 


4 _ 


1_ 


4 


-2 


1" 


0" 




Pi = 

-1 

V2 = 

-2 

9 




2 


2 





and S = {vi, v 2 , v 3 , v 4 }. Determine whether p x and p 2 are in 
conv S . 



Letvi = 


2" 


0 " 


~- 2 ~ 


-1 

0 


-2 


1 


2 

-1 

,v 2 = 

2 

,v 3 = 

0 

Vi = 

_3 

2 

2 


1 


2 


5 







L 2 J 



1 “ 

2 


6" 


"-1 “ 


0 


-4 

, and p 4 = 

-2 

P2 = 

1 

4 

^3 = 

1 

0 


7 

L 4 J 


_-l_ 


4 


and let S be 


the orthogonal set {vi, v 2 , v 3 }. Determine whether each p, is 

in Span S , aff S , or conv S . 

a. p 1 b. p 2 c. p 3 d. p 4 


Exercises 7-10 use the terminology from Section 8.2. 


7. a. Let T = 



, and let 



and p 4 = 




Find the barycentric coordinates of p x , p 2 , p 3 , and p 4 with 
respect to T . 

b. Use your answers in part (a) to determine whether each 
of p x ,..., p 4 in part (a) is inside, outside, or on the edge 
of conv T, a triangular region. 

8. Repeat Exercise 7 for T = 




and p 4 = 




9. Let S = {vi, v 2 , v 3 , v 4 } be an affinely independent set. Con¬ 
sider the points p 1 ,...,p 5 whose barycentric coordinates 
with respect to S are given by (2,0,0, —1), (0, 

(j,0, |, —1), (i, I), and (i,0, |, 0), respectively. De- 

termine whether each of p 1? ..., p 5 is inside, outside, or on 
the surface of conv S , a tetrahedron. Are any of these points 
on an edge of conv S ? 

10. Repeat Exercise 9 for the points q x ,..., q 5 whose barycen¬ 
tric coordinates with respect to S are given by (|, f, f), 

(O.f.i.O). (0,-2,0,3), and (i.i.i.0). 

respectively. 


In Exercises 11 and 12, mark each statement True or False. Justify 
each answer. 


11. a. If y = ciVi + c 2 v 2 + c 3 v 3 and c\ + c 2 + c 3 = 1, then y 

is a convex combination of Vi, v 2 , and v 3 . 

b. If S is a nonempty set, then conv S contains some points 
that are not in S . 

c. If S and T are convex sets, then S U T is also convex. 

12. a. A set is convex if x, y e S implies that the line segment 

between x and y is contained in S . 

b. If S and T are convex sets, then S D T is also convex. 
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c. If S is a nonempty subset of R 5 and y e conv S , then there 
exist distinct points vi,..., v 6 in S such that y is a convex 
combination of Vi,..., V 6 - 


13. Let 5 be a convex subset of IR" and suppose that 
/: IR" -> IR m is a linear transformation. Prove that the set 
/( S ) = {/(x) : x e S} is a convex subset of R m . 


14. Let /: IR" -> IR'" be a linear transformation and let 
T be a convex subset of IR'" . Prove that the set 


S = {x e IR" : /(x) e T } is a convex subset of 



n 



1 

1 _ 


1 

1 _ 


1 

1 _ 


1 

1 _ 

Vi = 

1 

0 

_ 1 

, V 2 = 

1 

to 

1 _ 

, V 3 = 

1 

to 

1 _ 

, V 4 = 

- 1 

O 

_ 1 


and 


P = 


2 

1 


. Confirm that 


p = 5 V 1 + }v 2 + ^v 3 + \\ A and Vi - v 2 + v 3 - v 4 = 0. 


Use the procedure in the proof of Caratheodory’s Theorem 
to express p as a convex combination of three of the v,- ’s. Do 
this in two ways. 


16. Repeat Exercise 15 for points Vi = 






, given that 



p = tL v i + il v 2 + ir v 3 + ir v 4 

and 


10vi — 6v 2 + 7v 3 — llv 4 = 0. 

In Exercises 17-20, prove the given statement about subsets A 
and B of IR" . A proof for an exercise may use results of earlier 
exercises. 

17. If A C B and B is convex, then conv A C B . 

18. If A C B , then conv A C conv B . 

19. a. [(conv A) U (conv B)] C conv (A U B) 


b. Find an example in IR 2 to show that equality need not hold 
in part (a). 

20. a. conv (A fl B) C [(conv A) n (conv B)] 

b. Find an example in R 2 to show that equality need not hold 
in part (a). 

21 . 


Let p 0 , p l9 and p 2 be points in R", and define 

foO) = (1 -f)Po + fPi, f](0 = (1 - f)Pi + fp 2 , and 

g (t) = (1 — f)fo(0 + *fi(0 for 0 < t < 1. For the points 
as shown below, draw a picture that shows f 0 (^),fi ( 3 ), and 

g(D- 




2 



22. Repeat Exercise 21 for f 0 (|), fj (|), and g (|). 

23. Let g(0 be defined as in Exercise 21. Its graph is called 

a quadratic Bezier curve , and it is used in some computer 
graphics designs. The points p 0 , p l9 and p 2 are called the 
control points for the curve. Compute a formula for g(£) 
that involves only p 0 , p l9 and p 2 . Then show that g(0 is in 
conv {p 0 , , p 2 } for 0 < t < 1 . 

24. Given control points p 0 , p 1? p 2 , and p 3 in R", let g, { (t) 
for 0 < t < 1 be the quadratic Bezier curve from Exer¬ 
cise 23 determined by p 0 , p l9 and p 2 , and let g 2 (0 be 
defined similarly for p l9 p 2 , and p 3 . For 0 < t < 1, define 
h (t) = (1 — 0§i(0 + l § 2 (0 • Show that the graph of h(^) 
lies in the convex hull of the four control points. This curve 
is called a cubic Bezier curve, and its definition here is one 
step in an algorithm for constructing Bezier curves (discussed 
later in Section 8 . 6 ). A Bezier curve of degree k is determined 
by k + 1 control points, and its graph lies in the convex hull 
of these control points. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. The points Vi, V 2 , and V 3 are not orthogonal, so compute 



1 

1 _ 


1 

OO 

1 _ 


1 

Lh 

_1 


1 

_ 1 

v 2 - Vi = 

1 

LO h-L 

1 _ 

, V 3 - Vi = 

2 

-3 

> Pi - Vl = 

1 

_ 1 

, and p 2 - vi = 

0 

-1 


Augment the matrix [ V2 — Vi V3 — Vi ] with both p x — Vi and p 2 — Vi , and row 
reduce: 


—1 

OO 

Ui 

U) 

_1 


1 

0 

_1 

-12 10 


O 

-OlLJ 

3 -3 -1 -1 


0 0 0 -§ 


_ 1 O 

The third column shows that Pi — Vi = ^(v 2 — Vi) + f (V3 — Vi), which leads to 
p 1 = Ovi + |v 2 + | V3 . Thus pj is in conv S. In fact, is in conv {v 2 , V3}. 
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The last column of the matrix shows that p 2 — Vi is not a linear combination of 
v 2 — Vi and V3 — Vi . Thus p 2 is not an affine combination of Vi , v 2 , and ¥3, so p 2 
cannot possibly be in conv S. 

An alternative method of solution is to row reduce the augmented matrix of 
homogeneous forms: 

10000 " 

0 1 0 1 0 

0 0 1 | 0 

0 0 0 0 i_ 

2. if P is a point above S , then the line through p with slope — 1 will intersect S at two 
points before it reaches the positive x- and y-axes. 


[vi 


v 2 v 3 


Pi P 2 1 




8.4 HYPERPLANES 


Hyperplanes play a special role in the geometry of W 1 because they divide the space into 
two disjoint pieces, just as a plane separates M 3 into two parts and a line cuts through 
M 2 . The key to working with hyperplanes is to use simple implicit descriptions, rather 
than the explicit or parametric representations of lines and planes used in the earlier 
work with affine sets. 1 

An implicit equation of a line in M 2 has the form ax + by = d . An implicit equa¬ 
tion of a plane in M 3 has the form ax + by + cz — d. Both equations describe the 
line or plane as the set of all points at which a linear expression (also called a linear 
functional) has a fixed value, d. 


DEFINITION 


A linear functional on R /? is a linear transformation / from W 1 into R. For each 
scalar d in R, the symbol [f:d] denotes the set of all x in M /? at which the value 
of / is d. That is, 

[/: d] is the set {x e R”: f(x) = d} 


The zero functional is the transformation such that /(x) 
other linear functionals on W 1 are said to be nonzero. 



EXAMPLE 1 In M 2 , the line x — Ay = 13 is a hyperplane in M 2 , and it is the set 
of points at which the linear functional f(x,y) = x — Ay has the value 13. That is, the 
line is the set [/: 13]. ■ 

EXAMPLE 2 In M 3 , the plane 5x — 2y + 3z = 21 is a hyperplane, the set of points 
at which the linear functional g(x, y,z) = 5x — 2y + 3 z has the value 21. This hyper¬ 
plane is the set [g: 21]. ■ 

If / is a linear functional on W 1 , then the standard matrix of this linear transforma¬ 
tion / is a 1 x n matrix A, say A = [a 1 < 2 2 • • • a n ]. So 

[/: 0] is the same as {x e M 7? : Ax = 0} = Nul A 

1 Parametric representations were introduced in Section 1.5. 


( 1 ) 
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If / is a nonzero functional, then rank A = 1, and dim NulA = n — 1, by the Rank 
Theorem. 2 Thus, the subspace [/ : 0] has dimension n — 1 and so is a hyperplane. Also, 
if d is any number in R , then 

[/: d] is the same as {x £ R 77 : Ax = d} (2) 

Recall from Theorem 6 in Section 1.5 that the set of solutions of Ax = b is obtained 
by translating the solution set of Ax = 0, using any particular solution p of Ax = b. 
When A is the standard matrix of the transformation /, this theorem says that 

[f-d] = [/: 0] +p for any pin [f: d] (3) 

Thus the sets [/: d] are hyperplanes parallel to [/: 0]. See Figure 1. 



FIGURE 1 Parallel hyperplanes, 
with / (p) = d . 


When A is a 1 x n matrix, the equation Ax = d may be written with an inner 
product n-x, using n in R 77 with the same entries as A. Thus, from (2), 

[/: d] is the same as {x £ R 77 : n -x = d} (4) 

Then [/: 0] = {x £ R 77 : n-x = 0}, which shows that [/: 0] is the orthogonal comple¬ 
ment of the subspace spanned by n. In the terminology of calculus and geometry for R 3 , 
n is called a normal vector to [/: 0]. (A “normal” vector in this sense need not have unit 
length.) Also, n is said to be normal to each parallel hyperplane [/ :d], even though 
n-x is not zero when d ^ 0. 

Another name for [/ \d) is a level set of / , and n is sometimes called the gradient 
of / when / (x) = n-x for each x. 


EXAMPLE 3 Let n 


3 

4 


and v 


1 

6 


, and let H = {x : n-x = 12}, so H 


[f : 12], where f(x,y) = 3x + 4 y. Thus H is the line 3x + 4y = 12. Find an implicit 
description of the parallel hyperplane (line) H\ — H + v. 

SOLUTION First, find a point p in H\ . To do this, find a point in H and add v to it. 


For instance, 


1 1 

OJ o 

1_1 

is in //, so p = 

r 

-6 

+ 

"0" 

3 

— 

i i 

1_1 


is in H i. Now, compute 


9. This shows that H\ = [/: — 9]. See Figure 2, which also shows the sub- 


n-p = - 

space Hq = {x : n-x = 0 }. 


The next three examples show connections between implicit and explicit descrip¬ 
tions of hyperplanes. Example 4 begins with an implicit form. 


2 See Theorem 14 in Section 2.9 or Theorem 14 in Section 4.6. 
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X 


H=[f: 12 ] 


EXAMPLE 4 In R 2 , give an explicit description of the line x — 4y = 13 in para¬ 
metric vector form. 


SOLUTION This amounts to solving a nonhomogeneous equation Ax = b, where A = 
[ 1 —4 ] and b is the number 13 in R. Write x — 13 + 4 y, where y is a free variable. 
In parametric form, the solution is 




= p + yq, 


y e R 



Converting an explicit description of a line into implicit form is more involved. The 
basic idea is to construct [/: 0 ] and then find d for [/: d\. 


EXAMPLES Letv! 


1 

2 


and V 2 


6 

0 


, and let L\ be the line through Vi and 


V 2 . Find a linear functional / and a constant d such that L 


l 


[f'.dl 


SOLUTION The line L\ is parallel to the translated line L o through V 2 — Vi and the 
origin. The defining equation for Lq has the form 


[a b] 




or n*x = 0 , 


where 



Since n is orthogonal to the subspace L 0 , which contains V 2 — Vi, compute 



and solve 




By inspection, a solution is [a b] — [2 5]. Let f(x,y ) = 2x + 5y. From (5), 
L 0 = [/:0], and L\ = [f:d] for some d. Since Vi is on line L i, d = /(v i) = 
2(1) + 5(2) = 12. Thus, the equation for L\ is 2x + 5y = 12. As a check, note that 
/(V 2 ) = /( 6 ,0) = 2(6) + 5(0) = 12, so V 2 is on L \, too. ■ 



" 1 " 


2 " 


3 

EXAMPLE 6 Letvi = 

1 

1 

, V 2 = 

-1 

4 

, and V 3 = 

1 

2 


. Find an implicit de¬ 


scription [/ :d] of the plane H\ that passes through vi, v 2 , and V 3 . 





































466 CHAPTER 8 The Geometry of Vector Spaces 


SOLUTION H x is parallel to a plane Ho through the origin that contains the translated 
points 



1 

1_ 


1 — 

<N 

1_ 

v 2 - Vl = 

1 

OJ to 

1_ 

and v 3 — v [ = 

1 

O 

1_ 


Since these two points are linearly independent, Ho = Span{v 2 — Vi,V3 — Vi}. Let 

a 




be the normal to Ho . Then v 2 — Vi and V 3 — Vi are each orthogonal to n. That 


c 

is, (v 2 — Vi)*n = 0 and (v 3 — Vi)*n = 0. These two equations form a system whose 
augmented matrix can be row reduced: 



a 


a 



[1 -2 3] 

b 

c 

= 0 , [2 0 1 ] 

b 

c 

= 0 , 

" 1 -2 3 0" 

2 0 10 


Row operations yield a 




(|)c, with c free. Set c 


4, for instance. Then 




and Ho = [/:0], where /(x) = — 2x\ + 5x 2 + 4 x 3 . 


The parallel hyperplane H\ is [/ :d]. To find d , use the fact that Vi is in H \, 
and compute d = /(v 1 ) = /(1,1,1) = —2(1) + 5(1) + 4(1) = 7. As a check, com¬ 
pute /(v 2 ) = /(2, —1,4) = —2(2) + 5(—1) + 4(4) = 16 — 9 = 7. Observe /(V3) = 
7 also. ■ 


The procedure in Example 6 generalizes to higher dimensions. However, for the 
special case of M 3 , one can also use the cross-product formula to compute n, using a 
symbolic determinant as a mnemonic device: 


n = (v 2 - Vi) x (y 3 -v 0 




= —2i + 5j + 4k = 





If only the formula for / is needed, the cross-product calculation may be written 
as an ordinary determinant: 



1 

2 

Xi 


f{x l,x 2 ,x 3 ) = 

-2 

0 

x 2 

— 


3 

1 

x 3 



= —2xi + 5x 2 + 4 x 3 



2 

1 


%2 + 



2 

0 


x 3 


So far, every hyperplane examined has been described as [/: d] for some linear 
functional / and some d in M, or equivalently as {x e M /? : n«x = d} for some n in M /? . 
The following theorem shows that every hyperplane has these equivalent descriptions. 


THEOREM 11 


A subset H of M /? is a hyperplane if and only if H = [f:d] for some nonzero 
linear functional / and some scalar d in ®L Thus, if H is a hyperplane, there 
exist a nonzero vector n and a real number d such that H = {x : n-x = d}. 
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PR001 Suppose that H is a hyperplane, take p e //, and let Ho = H — p. Then Ho 
is an (n — 1)-dimensional subspace. Next, take any point y that is not in Hq. By the 
Orthogonal Decomposition Theorem in Section 6.3, 

y = yi +n 

where y 1 is a vector in Ho and n is orthogonal to every vector in Ho. The function / 
defined by 

/(x) = n*x for x e R" 

is a linear functional, by properties of the inner product. Now, [/: 0] is a hyperplane that 
contains Ho , by construction of n. It follows that 

Ho = [/: 0 ] 

[Argument: Ho contains a basis S of n — 1 vectors, and since S is in the (n — 1)- 
dimensional subspace [/: 0], S must also be a basis for [/: 0], by the Basis Theorem.] 
Finally, let d = /(p) = n*p. Then, as in (3) shown earlier, 

[f-d] = [/: 0] + p = H 0 +p = H 

The converse statement that [/: d] is a hyperplane follows from (1) and (3) above. ■ 

Many important applications of hyperplanes depend on the possibility of “separat¬ 
ing” two sets by a hyperplane. Intuitively, this means that one of the sets is on one side 
of the hyperplane and the other set is on the other side. The following terminology and 
notation will help to make this idea more precise. 




X 


FIGURE 3 

The set S is closed and bounded. 


TOPOLOGY IN R n : TERMS AND FACTS 


For any point p in M 7? and any real S > 0, the open ball B(p, S) with center p and 
radius 8 is given by 


B(p,8) = {x : 


x-p 



Given a set S in R n , a point p is an interior point of S if there exists a 8 > 0 
such that £(p, S) C S. If every open ball centered at p intersects both S and the 
complement of S , then p is called a boundary point of S . A set is open if it 
contains none of its boundary points. (This is equivalent to saying that all of its 
points are interior points.) A set is closed if it contains all of its boundary points. 
(If S contains some but not all of its boundary points, then S is neither open nor 
closed.) A set S is bounded if there exists a 8 > 0 such that S C i?(0, 8). A set 
in M /? is compact if it is closed and bounded. 


Theorem: The convex hull of an open set is open, and the convex hull of a 
compact set is compact. (The convex hull of a closed set need not be closed. See 
Exercise 27.) 


EXAMPLE 7 Let 




conv 










as shown in Figure 3. Then is an interior point since i?(p, |) C S. The point p 2 
is a boundary point since every open ball centered at p 2 intersects both S and the 
complement of S . The set S is closed since it contains all its boundary points. The 
set S is bounded since S C i?(0, 3). Thus S is also compact. ■ 
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DEFINITION 


THEOREM 12 


THEOREM 13 


Notation: If / is a linear functional, then f(A) < d means /(x) < d for each x e A. 
Corresponding notations will be used when the inequalities are reversed or when they 
are strict. 

The hyperplane H = [f:d] separates two sets A and B if one of the following 
holds: 

(i) f(A) < d and f(B ) > d, or 

(ii) f(A) > d and f(B) < d. 

If in the conditions above all the weak inequalities are replaced by strict inequal¬ 
ities, then H is said to strictly separate A and B . 


Notice that strict separation requires that the two sets be disjoint, while mere sep¬ 
aration does not. Indeed, if two circles in the plane are externally tangent, then their 
common tangent line separates them (but does not separate them strictly). 

Although it is necessary that two sets be disjoint in order to strictly separate them, 
this condition is not sufficient, even for closed convex sets. For example, let 




1 1 

: x > - and — < y < 2 
2 x 


and B = 



: x > 0 and y 



Then A and B are disjoint closed convex sets, but they cannot be strictly separated 
by a hyperplane (line in M 2 ). See Figure 4. Thus the problem of separating (or strictly 
separating) two sets by a hyperplane is more complex than it might at first appear. 


y 



FIGURE 4 Disjoint closed convex sets. 


There are many interesting conditions on the sets A and B that imply the existence 
of a separating hyperplane, but the following two theorems are sufficient for this section. 
The proof of the first theorem requires quite a bit of preliminary material, 3 but the second 
theorem follows easily from the first. 


Suppose A and B are nonempty convex sets such that A is compact and B is 
closed. Then there exists a hyperplane H that strictly separates A and B if and 
only if A n B = 0. 


Suppose A and B are nonempty compact sets. Then there exists a hyperplane that 
strictly separates A and B if and only if (conv A) n (conv B) = 0. 


3 A proof of Theorem 12 is given in Steven R. Lay, Convex Sets and Their Applications (New York: John 
Wiley & Sons, 1982; Mineola, NY: Dover Publications, 2007), pp. 34-39. 















8.4 Hyperplanes 469 


PROOF Suppose that (conv A) n (conv B) = 0. Since the convex hull of a compact 
set is compact, Theorem 12 ensures that there is a hyperplane H that strictly separates 
conv A and conv B . Clearly, H also strictly separates the smaller sets A and B . 

Conversely, suppose the hyperplane H = [f: d] strictly separates A and B . With¬ 
out loss of generality, assume that f(A) < d and f(B) > d. Letx = C\X\ + ••• + c k x k 
be any convex combination of elements of A . Then 

fix) = Cifixf} H-b c k fix k ) < c\d H- Yc k d -d 

since C\ + • • • + c k = 1. Thus /(convH) < d. Likewise, /(conv B) > d, so H — 
[f : d] strictly separates conv A and conv B . By Theorem 12, conv A and conv B must 
be disjoint. ■ 

EXAMPLE 8 Let 



"2" 


"-3" 


"3" 


"1" 


2" 

ai = 

1 

1 

, a 2 = 

2 

1 

, a 3 = 

4 

0 

, b! = 

0 

2 

, and b 2 = 

-1 

5 


and let A = {ai, a 2 , a 3 } and B = {bi, b 2 }. Show that the hyperplane H = [/: 5], where 
f(x i, X 2 , X 3 ) = 2xi — 3 x 2 + X 3 , does not separate A and B. Is there a hyperplane par¬ 
allel to H that does separate A and B ? Do the convex hulls of A and B intersect? 

SOLUTION Evaluate the linear functional / at each of the points in A and B: 

/(ai) = 2, /(a 2 ) = -ll, /(a 3 ) =-6, /(bi) = 4, and /(b 2 ) = 12 

Since /(bi) = 4 is less than 5 and /(b 2 ) = 12 is greater than 5, points of B lie on both 
sides of H = [/: 5] and so H does not separate A and i?. 

Since fiA) < 3 and /(£) > 3, the parallel hyperplane [/: 3] strictly separates A 
and B . By Theorem 13, (conv v4) fl (conv B) = 0. 

Caution: If there were no hyperplane parallel to H that strictly separated A and B , 
this would not necessarily imply that their convex hulls intersect. It might be that some 
other hyperplane not parallel to H would strictly separate them. ■ 


PRACTICE PROBLEM 



"1" 


~-l" 


r 


~- 2 " 


Let Pi = 

0 

2 

p 2 = 

2 

1 

, n, = 

1 

-2 

, and n 2 = 

1 

3 

; let Hi be the hyper- 


plane (plane) in M 3 passing through the point and having normal vector ni; and let 
H 2 be the hyperplane passing through the point p 2 and having normal vector 112 . Give 
an explicit description of H\ fl H 2 by a formula that shows how to generate all points 
in H\ H H 2 . 


8.4 EXERCISES 


1. Let L be the line in M 2 through the points 



Find a linear functional / and a real number d such that 

L = [ f:d ]. 


2. Let L be the line in M 2 through the points 



Find a linear functional / and a real number d such that 

L = [f:d]. 
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In Exercises 3 and 4, determine whether each set is open or closed 
or neither open nor closed. 


3. 

a. 

{(x.y) 

y> 0 } 




b. 

{(x.y) 

x — 2 and 1 < y 

< 

3} 


c. 

{(x.y) 

x — 2 and 1 < y 

< 

3} 


d. 

{(x.y) 

xy — 1 and x > 

0 } 



e. 

{(x,y) 

xy > 1 and x > 

0 } 


4. 

a. 

{(x.y) 

x 2 + y 2 = 1 } 




b. 

{(x.y) 

x 2 + y 2 > 1 } 




c. 

{(x.y) 

x 2 + y 2 < 1 and 

y 

> 0 } 


d. 

{(x.y) 

y > x 2 } 




e. 

{(x.y) 

y < x 2 } 




In Exercises 5 and 6 , determine whether or not each set is compact 
and whether or not it is convex. 

5. Use the sets from Exercise 3. 

6. Use the sets from Exercise 4. 

In Exercises 7-10, let H be the hyperplane through the listed 
points. (a) Find a vector n that is normal to the hyperplane, (b) Find 
a linear functional / and a real number d such that H = [f:d]. 


7. 


9. 


1 


1 

CN 

1 _ 


-1 



1 

5 

4 

5 

-2 



1 

LO 

1 _ 


1 


i 

LTl 

1 _ 


1 


i 

CN 

1 _ 


1 


1 

0 


3 


2 


1 

1 

5 

1 

5 

2 

9 

1 

1 

o 

1 _ 


i 

o 

_i 


i 

o 

_i 


1 


8 . 


1 

2 

1 


4 

2 

3 


7 

-4 

4 


10 . 


1 


i 

CN 

1 _ 


1 


1 

_ 1 

2 


2 


3 


2 

0 


-1 

9 

2 

9 

-1 

i 

o 

i_ 


-3 


7 


-1 


11 . 



1 " 


2 " 


" 0 “ 


~- 2 “ 


-3 


1 


1 


0 

P = 

1 

, n = 

5 

, Vi = 

1 

, v 2 = 

1 


2 


-1 


1 


3 


and v 3 = 


1 

4 

0 

4 


, and let H be the hyperplane in R 4 with 


normal n and passing through p. Which of the points Vi, v 2 , 
and v 3 are on the same side of H as the origin, and which are 
not? 



2 " 


"3" 


"-1" 


0 " 

12. Let ai = 

-1 

5 

,a 2 = 

1 

3 

,a 3 = 

6 

0 

I), = 

5 

-1 



1 


" 2 " 


3" 

b 2 = 

1 

1_ 

, b 3 = 

2 

1 

, and n = 

1 

-2 


and let 


A = {ai, a 2 , a 3 } and B = {b 3 , b 2 , b 3 }. Find a hyperplane H 


with normal n that separates A and B . Is there a hyperplane 
parallel to H that strictly separates A and B ? 


13. Let P! = 


i 

CN 

1 _ 


1 

1 _ 


1 

1 _ 

-3 


2 


2 

1 

» P 2 = 

-1 

, n l = 

4 

i 

CN 

_ 1 


i 

LO 

1 _ 


i 

CN 

_ 1 


and 


n 2 = 


2 

3 

1 

5 


; let H i be the hyperplane in R 4 through pj with 


normal ni; and let H 2 be the hyperplane through p 2 with 
normal n 2 . Give an explicit description of H\ D H 2 . [Hint: 
Find a point p in H { D H 2 and two linearly independent 
vectors Vi and v 2 that span a subspace parallel to the 2 - 
dimensional flat H\ n H 2 .] 


14. Let F\ and F 2 be 4-dimensional flats in R 6 , and suppose that 
F\ fl F 2 ^ 0 . What are the possible dimensions of F\ D F 2 ? 

In Exercises 15-20, write a formula for a linear functional / and 
specify a number d , so that [/: d] is the hyperplane H described 
in the exercise. 


15. Let A be the 1x4 matrix [ 1 —3 4 

H = {x in R 4 : Ax = b}. 


—2] and let/? = 5. Let 


16. Let A be the 1x5 matrix [2 5 —3 0 6 ]. Note that 


17. 


Nul A is in R 5 . Let H = Nul A. 

Let H be the plane in R 3 spanned by the rows of B — 
1 3 5 


0 


. That is, H = Row B. [Hint: How is H 


related to Nul B1 See Section 6.1.] 

18. Let H be the plane in R 3 spanned by the rows of B = 


1 4 -5 

0-2 8 


. That is, H = Row B 


19. Let H be the column space of the matrix B = 


1 0 
4 2 

-7 -6 


That is, H = Col B . [Hint: How is Col B related to Nul B T 1 
See Section 6.1.] 


20. Let H be the column space of the matrix B = 


That is, H = Col B. 


1 0 
5 2 

-4 -4 


In Exercises 21 and 22, mark each statement True or False. Justify 
each answer. 

21. a. A linear transformation from R to R /? is called a linear 

functional. 

b. If / is a linear functional defined on R ,J , then there exists 
a real number k such that /(x) = kx for all x in R /? . 

c. If a hyperplane strictly separates sets A and B, then 

An B = 0 . 

d. If A and B are closed convex sets and An B = 0 , then 
there exists a hyperplane that strictly separates A and B . 
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22. a. If d is a real number and / is a nonzero linear functional 

defined on M”, then [f:d] is a hyperplane in R ' 1 . 

b. Given any vector n and any real number d, the set 
{x : n*x = d} is a hyperplane. 

c. If ^4 and B are nonempty disjoint sets such that A is 
compact and B is closed, then there exists a hyperplane 
that strictly separates A and B . 


d. If there exists a hyperplane H such that H does not 
strictly separate two sets A and B , then (conv A) fl 
(conv B) ^ 0. 


23. Let Vi = 


1 

1 


, v 2 = 


3 

0 


, v 3 = 


5 

3 


, and p = 


4 

1 


.Find 


a hyperplane [f:d] (in this case, a line) that strictly separates 
p from conv { Vi, v 2 , v 3 }. 


24. Repeat Exercise 23 for Vi = 


" 1 " 


"5" 


"4" 

2 

, v 2 = 

1 

, v 3 = 

4 


and p = 


2 

3 


25. Let p = 


4 

1 


. Find a hyperplane [f:d] that strictly sepa¬ 


rates 2?(0, 3) and 2?(p, 1). [Hint: After finding /, show that 
the point v = (1 — .75)0 + .75p is neither in 2?(0, 3) nor in 

B( p, 1).] 


26. Let q = 


2 

3 


and p = 


6 

1 


. Find a hyperplane [f:d] that 


strictly separates B( q, 3) and B( p, 1). 


27. Give an example of a closed subset S of R 2 such that conv S 
is not closed. 


28. Give an example of a compact set A and a closed set B in R 2 
such that (conv A) fl (conv B) = 0 but A and B cannot be 
strictly separated by a hyperplane. 


29. Prove that the open ball B(p,8) = {x : ||x — p|| < 8} is a 
convex set. [Hint: Use the Triangle Inequality.] 


30. Prove that the convex hull of a bounded set is bounded. 


SOLUTION TO PRACTICE PROBLEM 

First, compute ni • = —3 and n 2 • p 2 = 7. The hyperplane H\ is the solution set of the 

equation x\ + x 2 — 2 x 3 = —3, and H 2 is the solution set of the equation — 2x\ + x 2 + 
3 x 3 = 7. Then 

H\ fl H 2 = {x : x\ + x 2 — 2 x 3 = —3 and — 2xi + x 2 + 3 x 3 = 7} 


This is an implicit description of H\ D H 2 . To find an explicit description, solve the 
system of equations by row reduction: 



Thus X \ = 

-y + fx 3 ,X 2 = 5 + 5*3’ x 3 = x 3 . Let p = 

10 

3 

1 

3 

and v = 

5 

3 

1 

3 



0 


1 


.The 


general solution can be written as x = p + X 3 V. Thus H\ fl H 2 is the line through p in 
the direction of v. Note that v is orthogonal to both ni and n 2 . 


8.5 POLYTOPES 


This section studies geometric properties of an important class of compact convex sets 
called polytopes. These sets arise in all sorts of applications, including game theory 
(Section 9.1), linear programming (Sections 9.2 to 9.4), and more general optimization 
problems, such as the design of feedback controls for engineering systems. 
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DEFINITION 


DEFINITION 


A polytope in R 77 is the convex hull of a finite set of points. In R 2 , a polytope 
is simply a polygon. In l 3 , a polytope is called a polyhedron. Important features of 
a polyhedron are its faces, edges, and vertices. For example, the cube has 6 square 
faces, 12 edges, and 8 vertices. The following definitions provide terminology for higher 
dimensions as well as R 2 and R 3 . Recall that the dimension of a set in R 77 is the dimen¬ 


sion of the smallest flat that contains it. Also, note that a polytope is a special type of 
compact convex set, because a finite set in R 77 is compact and the convex hull of this set 
is compact, by the theorem in the topology terms and facts box in Section 8.4. 


Let S be a compact convex subset of R 77 . A nonempty subset F of S is called 
a (proper) face of S if F ^ S and there exists a hyperplane H = [f:d] such 
that F = S f! H and either /(S) < d or /(S') > d . The hyperplane H is called 
a supporting hyperplane to S. If the dimension of F is k, then F is called a 
A>face of S . 

If P is a polytope of dimension k , then P is called a A>polytope. A 0-face of P 
is called a vertex (plural: vertices), a 1-face is an edge, and a (k — 1)-dimensional 
face is a facet of S . 


EXAMPLE 1 Suppose S is a cube in R 3 . When a plane H is translated through 
R 3 until it just touches (supports) the cube but does not cut through the interior of the 
cube, there are three possibilities for H n S, depending on the orientation of H . (See 
Figure 1.) 


H fl S may be a 2-dimensional square face (facet) of the cube. 

H n S may be a 1 -dimensional edge of the cube. 

H n S may be a O-dimensional vertex of the cube. ■ 



H fl S is 2-dimensional. 



FIGURE 1 



H D Sis O-dimensional. 


Most applications of polytopes involve the vertices in some way, because they have 
a special property that is identified in the following definition. 


Let be a convex set. A point p in S is called an extreme point of S' if p is 
not in the interior of any line segment that lies in S. More precisely, if x, y e S 
and p e xy, then p = x or p = y. The set of all extreme points of S is called the 

profile of S . 
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A vertex of any compact convex set S is automatically an extreme point of S . This 
fact is proved during the proof of Theorem 14, below. In working with a poly tope, say 
P = conv {vi ,..., V£ } for vi ,..., Vfc in R", it is usually helpful to know that Vi,..., 
are the extreme points of P . However, such a list might contain extraneous points. For 
example, some vector v z - could be the midpoint of an edge of the poly tope. Of course, 
in this case v z is not really needed to generate the convex hull. The following definition 
describes the property of the vertices that will make them all extreme points. 


DEFINITION The set {vi,..., v^} is a minimal representation of the polytope P if P = 

conv {vi,..., v*;} and for each i = 1,..., k, v z $ conv {v 7 : j ^ i }. 

Every polytope has a minimal representation. For if P = conv{vi,..., v^} and if 
some v z is a convex combination of the other points, then v z may be deleted from the 
set of points without changing the convex hull. This process may be repeated until the 
minimal representation is left. It can be shown that the minimal representation is unique. 

THEOREM 14 Suppose M = {vi,..., } is the minimal representation of the polytope P . Then 

the following three statements are equivalent: 

a. p e M . 

b. p is a vertex of P . 

c. p is an extreme point of P . 




r 



FIGURE 2 


PROOF (a) => (b) Suppose p e M and let Q = conv{v : v e M and v ^ p}. It fol¬ 
lows from the definition of M that p ^ Q , and since Q is compact, Theorem 13 implies 
the existence of a hyperplane H r that strictly separates {p} and Q . Let H be the hyper¬ 
plane through p parallel to H '. See Figure 2. 

Then Q lies in one of the closed half-spaces // + bounded by H and so P c // + . 
Thus H supports P at p. Furthermore, p is the only point of P that can lie on //, so 
H (T P = {p} and p is a vertex of P . 

(b) => (c) Let p be a vertex of P . Then there exists a hyperplane H = [f:d] such 
that H D P = {p} and f(P) > d. If p were not an extreme point, then there would 
exist points x and y in P such that p = (1 — c)x + cy with 0 < c < 1. That is, 


cy = P - (1 “ c)x 



It follows that / (y) = 




1 /(x) . But /(p) = d and /(x) > d, so 


/« S ( 





On the other hand, y e P , so /(y) > d . It follows that /( y) = d and that ye H n P . 
This contradicts the fact that p is a vertex. So p must be an extreme point. (Note that 
this part of the proof does not depend on P being a poly tope. It holds for any compact 
convex set.) 

(c) (a) It is clear that any extreme point of P must be a member of M. ■ 
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EXAMPLE 2 Recall that the profile of a set S is the set of extreme points of S. 
Theorem 14 shows that the profile of a polygon in R 1 2 is the set of vertices. (See Figure 3.) 
The profile of a closed ball is its boundary. An open set has no extreme points, so its 
profile is empty. A closed half-space has no extreme points, so its profile is empty. ■ 






Exercise 18 asks you to show that a point p in a convex set S is an extreme point 
of S if and only if, when p is removed from S , the remaining points still form a convex 
set. It follows that if 5* is any subset of S such that conv S* is equal to S , then 5* must 
contain the profile of S . The sets in Example 2 show that in general S * may have to be 
larger than the profile of S . It is true, however, that when S is compact, we may actually 
take S* to be the profile of S , as Theorem 15 will show. Thus every nonempty compact 
convex set S has an extreme point, and the set of all extreme points is the smallest subset 
of S whose convex hull is equal to S . 


THEOREM 15 


Let S be a nonempty compact convex set. Then S is the convex hull of its profile 
(the set of extreme points of S). 


PROOF The proof is by induction on the dimension of the set S } ■ 

One important application of Theorem 15 is the following theorem. It is one of the 
key theoretical results in the development of linear programming. Linear functionals 
are continuous, and continuous functions always attain their maximum and minimum 
on a compact set. The significance of Theorem 16 is that for compact convex sets, the 
maximum (and minimum) is actually attained at an extreme point of S . 


THEOREM 16 


Let / be a linear functional defined on a nonempty compact convex set S . Then 
there exist extreme points v and w of S such that 





PROOI Assume that / attains its maximum m on S at some point v 7 in S . That is, 
/(v 7 ) = m . We wish to show that there exists an extreme point in S with the same 
property. By Theorem 15, v 7 is a convex combination of the extreme points of S . That 
is, there exist extreme points Vi,..., of S and nonnegative c\,... ,Ck such that 

v 7 = C\\\ + • • • + c^k with C\ + • • • + Ck = 1 

If none of the extreme points of S satisfies /(v) = m, then 

/(Vi) < m for i = 1 ,..., k 


1 The details may be found in Steven R. Lay, Convex Sets and Their Applications (New York: John Wiley & 

Sons, 1982; Mineola, NY: Dover Publications, 2007), p. 43. 
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since m is the maximum of / on S. But then, because / is linear, 

m = f{x') = f(c 1 V 1 H-b c k x k ) 

= Ci/(vi) +-b C k f(x k ) 

< c[m + • • • + Ckin = m(c\ + • • • + Ck) = m 

This contradiction implies that some extreme point v of S must satisfy /(v) = m . 

The proof for w is similar. ■ 


EXAMPLE 3 


Given points 




, and p 3 




in M 2 , let S 


conv {Pi, p 2 , p 3 }- For each linear functional /, find the maximum value m of / on the 
set S , and find all points x in S at which /(x) = m . 


a. f\(xi,x 2 ) = xi T x 2 b. / 2 (xi,x 2 ) =-3xi + x 2 c. / 3 O 1 , x 2 ) = xi + 2x 2 


SOLUTION By Theorem 16, the maximum value is attained at one of the extreme points 

of S. So to find m , evaluate / at each extreme point and select the largest value. 

a - /i(Pi) = -l,/i(p 2 ) = 4, and /i(p 3 ) = 3, so mi = 4. Graph the line fi(x u x 2 ) = 
m 1 , that is, Xi + x 2 = 4, and note that x = p 2 is the only point in S at which f\ (x) = 
4. See Figure 4(a). 

b. / 2 (Pi) = 3,/ 2 (p 2 ) = -8,and / 2 (p 3 ) = -l,som 2 = 3 . Graph the line f 2 (x u x 2 ) = 
m 2 , that is, — 3xi + x 2 = 3, and note that x = is the only point in S at which 
/ 2 (x) = 3. See Figure 4(b). 

c - /3(Pi) = -1, / 3 (p 2 ) = 5, and / 3 (p 3 ) = 5, so m 3 = 5. Graph the line f 3 (x u x 2 ) = 
m 3 , that is, Xi + 2x 2 = 5. Here, / 3 attains its maximum value at p 2 , at p 3 , and at 
every point in the convex hull of p 2 and p 3 . See Figure 4(c). ■ 


*2 


*2 


*2 



X 


1 



X 


1 



*1 


FIGURE 4 


The situation illustrated in Example 3 for M 2 also applies in higher dimensions. The 
maximum value of a linear functional / on a polytope P occurs at the intersection of 
a supporting hyperplane and P . This intersection is either a single extreme point of P , 
or the convex hull of 2 or more extreme points of P. In either case, the intersection is a 
polytope, and its extreme points form a subset of the extreme points of P. 

By definition, a polytope is the convex hull of a finite set of points. This is an explicit 
representation of the polytope since it identifies points in the set. A polytope may also 
be represented implicitly as the intersection of a finite number of closed half-spaces. 
Example 4 illustrates this in M 2 . 
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EXAMPLE 4 Let 


Pi 


0 

1 


P2 


1 

0 


and p 3 


3 

2 


in M 2 , and let S = conv {p t , p 2 , p 3 }. Simple algebra shows that the line through and 
p 2 is given by x\ -h x 2 = 1, and S is on the side of this line where 


x\ + X 2 > 1 or, equivalently, 


X \ — X2 < — 1 . 


Similarly, the line through p 2 and p 3 is x\ — x 2 = 1, and S is on the side where 


X \ — X2 < 1 


Also, the line through p 3 and p 1 is — X \ + 3x2 = 3, and S is on the side where 

—x\ + 3x 2 < 3. 

See Figure 5. It follows that S can be described as the solution set of the system of linear 
inequalities 

— x\ — x 2 < —1 
X \ — X2 < 1 

—x\ + 3x 2 < 3 

This system may be written as Ax < b, where 



"-1 

-1" 




"-I" 

A = 

1 

-1 

-1 

3 

, x = 

X \ 

_ X 2 _ 

, and b = 

1 

3 


Note that an inequality between two vectors, such as Ax and b, applies to each of the 
corresponding coordinates in those vectors. ■ 



In Chapter 9, it will be necessary to replace an implicit description of a polytope by 
a minimal representation of the poly tope, listing all the extreme points of the poly tope. 
In simple cases, a graphical solution is feasible. The following example shows how to 
handle the situation when several points of interest are too close to identify easily on a 
graph. 


EXAM PLE 5 Let P be the set of points in M 2 that satisfy Ax < b, where 



"1 

3 


18 

A = 

1 

1 

and b = 

8 


3 

2 


21 


and x > 0. Find the minimal representation of P . 
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SOLUTION The condition x > 0 places P in the first quadrant of M 2 , a typical con¬ 
dition in linear programming problems. The three inequalities in Ax < b involve three 
boundary lines: 

(1) x\ + 3x2 = 18 (2) X\ T X 2 — 8 (3) 3xi + 2x2 = 21 

All three lines have negative slopes, so a general idea of the shape of P is easy to 
visualize. Even a rough sketch of the graphs of these lines will reveal that (0, 0), (7,0), 
and ( 0 , 6) are vertices of the poly tope P. 

What about the intersections of the lines (1), (2), and (3)? Sometimes it is clear 
from the graph which intersections to include. But if not, then the following algebraic 
procedure will work well: 

When an intersection point is found that corresponds to two inequalities, test it 
in the other inequalities to see whether the point is in the polytope. 

The intersection of (1) and (2) is p 12 = (3, 5). Both coordinates are nonnegative, 
so p 12 satisfies all inequalities except possibly the third inequality. Test this: 

3(3) + 2(5) = 19 < 21 

This intersection point satisfies the inequality for (3), so p 12 is in the poly tope. 

The intersection of (2) and (3) is p 23 = (5, 3). This satisfies all inequalities except 
possibly the inequality for (1). Test this: 

1(5) + 3(3) = 14 < 18 

This shows that p 23 is in the polytope. 

Finally, the intersection of (1) and (3) is p 13 = (y, y). Test this in the inequality 
for ( 2 ): 

1 (y) + 1 (y) = f « 8.6 > 8 

Thus p 13 does not satisfy the second inequality, which shows that p 13 is not in P An 
conclusion, the minimal representation of the polytope P is 


The remainder of this section discusses the construction of two basic polytopes 
in M 3 (and higher dimensions). The first appears in linear programming problems, the 
subject of Chapter 9. Both polytopes provide opportunities to visualize M 4 in a remark¬ 
able way. 

Simplex 

A simplex is the convex hull of an affinely independent finite set of vectors. To construct 
a k -dimensional simplex (or /^-simplex), proceed as follows: 

0 -simplex S°: a single point {vi} 

1- simplex S 1 : con \(S° U {V 2 }), with v 2 not in aff S° 

2- simplex S 2 : conv(5 ' 1 U {v 3 }), with v 3 not in aff S { 



k -simplex S k : con\(S k 1 U {v^+i}), with v^+i not in aff S k 1 

The simplex S [ is a line segment. The triangle S 2 comes from choosing a point 
v 3 that is not in the line containing S l and then forming the convex hull with S 1 . 
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FIGURE 6 


(See Figure 6.) The tetrahedron S 3 is produced by choosing a point V4 not in the plane 
of S 2 and then forming the convex hull with S 2 . 

Before continuing, consider some of the patterns that are appearing. The triangle S 2 
has three edges. Each of these edges is a line segment like S 1 . Where do these three line 
segments come from? One of them is S 1 . One of them comes by joining the endpoint 
\2 to the new point V3. The third comes from joining the other endpoint Vi to V3. You 
might say that each endpoint in S l is stretched out into a line segment in S 2 . 

The tetrahedron S 3 in Figure 6 has four triangular faces. One of these is the original 
triangle S 2 , and the other three come from stretching the edges of S 2 out to the new 
point V4. Notice too that the vertices of S 2 get stretched out into edges in S 3 . The other 
edges in S 3 come from the edges in S 2 . This suggests how to “visualize” the four¬ 
dimensional S 4 . 

The construction of S 4 , called a pentatope, involves forming the convex hull of S 3 
with a point V5 not in the 3 -space of S 3 . A complete picture is impossible, of course, 
but Figure 7 is suggestive: S 4 has five vertices, and any four of the vertices determine 
a facet in the shape of a tetrahedron. For example, the figure emphasizes the facet with 
vertices Vi, V2, V4, and V5 and the facet with vertices V2, V3, V4, and V5. There are five 



FIGURE 7 The 4-dimensional simplex S 4 projected onto R 2 , with two 
tetrahedral facets emphasized. 
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such facets. Figure 7 identifies all ten edges of S 4 , and these can be used to visualize 
the ten triangular faces. 

Figure 8 shows another representation of the 4-dimensional simplex S 4 . This time 
the fifth vertex appears “inside” the tetrahedron S 3 . The highlighted tetrahedral facets 
also appear to be “inside” S 3 . 



FIGURE 8 The fifth vertex of S 4 is “inside” S 3 . 


Hypercube 


Let If = Oe, be the line segment from the origin 0 to the standard basis vector e* in M 
Then for k such that 1 < k < n , the vector sum 2 


n 


c k — I\ +12 + • • • + Ik 


is called a -dimensional hypercube. 

To visualize the construction of C k , start with the simple cases. The hypercube C 1 
is the line segment I \. If C 1 is translated by e 2 , the convex hull of its initial and final 
positions describes a square C 2 . (See Figure 9.) Translating C 2 by e 3 creates the cube 
C 3 . A similar translation of C 3 by the vector ^ yields the 4-dimensional hypercube C 4 . 

Again, this is hard to visualize, but Figure 10 shows a 2-dimensional projection 
of C 4 . Each of the edges of C 3 is stretched into a square face of C 4 . And each of the 
square faces of C 3 is stretched into a cubic face of C 4 . Figure 11 shows three facets 
of C 4 . Part (a) highlights the cube that comes from the left square face of C 3 . Part (b) 
shows the cube that comes from the front square face of C 3 . And part (c) emphasizes 
the cube that comes from the top square face of C 3 . 


2 The vector sum of two sets A and B is defined by A + B = {c : c = a + b for some a € A and b € B }. 
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C 1 

FIGURE 9 Constructing the cube C 3 . 




3 





FIGURE 10 C 4 projected onto M 2 . 




FIGURE 11 Three of the cubic facets of C 4 . 


Figure 12 shows another representation of C 4 in which the translated cube is placed 
“inside” C 3 . This makes it easier to visualize the cubic facets of C 4 , since there is less 
distortion. 



FIGURE 12 The translated image of 
C 3 is placed “inside” C 3 to obtain C 4 . 


Altogether, the 4-dimensional cube C 4 has eight cubic faces. Two come from the 
original and translated images of C 3 , and six come from the square faces of C 3 that are 
stretched into cubes. The square 2-dimensional faces of C 4 come from the square faces 
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of C 3 and its translate, and the edges of C 3 that are stretched into squares. Thus there 
are 2 x 6 + 12 = 24 square faces. To count the edges, take 2 times the number of edges 
in C 3 and add the number of vertices in C 3 . This makes 2x12 + 8 = 32 edges in C 4 . 
The vertices in C 4 all come from C 3 and its translate, so there are 2x8= 16 vertices. 

One of the truly remarkable results in the study of polytopes is the following for¬ 
mula, first proved by Leonard Euler (1707-1783). It establishes a simple relationship 
between the number of faces of different dimensions in a poly tope. To simplify the 
statement of the formula, let fk(P) denote the number of /:-dimensional faces of an 
n -dimensional poly tope P ? 

n —1 

Euler’s formula: ^(-1 f f k {P) = 1 + 

k = 0 

In particular, when n = 3, v — e + f = 2, where v , e , and / denote the number of 
vertices, edges, and facets (respectively) of P . 


PRACTICE PROBLEM 


Find the minimal representation of the polytope P defined by the inequalities Ax < b 



"1 

3 


" 12 " 

and x > 0, when A = 

1 

2 

and b = 

9 


2 

1 


12 


8.5 EXERCISES 


1. Given points Pj = 


3 


1 

0 


P 2 = 


2 

3 


,andp 3 = 


-1 

2 


in M 2 , 


let S = conv {Pi, p 2 ,p 3 }. For each linear functional /, find 
the maximum value m of / on the set S , and find all points 
x in S at which /(x) = m. 


a. /(x i,x 2 ) = xi - x 2 
c. f(xi,x 2 ) =-3xi + x 2 


b. fixi,x 2 ) = xi + x 2 


2. Given points p 1 = 


0 

-1 


V 2 = 


2 

1 


,andp 3 = 


1 

2 


in M 2 , 


let S = conv {Pi, p 2 , p 3 }- For each linear functional /, find 
the maximum value m of / on the set S , and find all points 
x in S at which /(x) = m . 


a. fixi,x 2 ) = x { + x 2 

c. fixi,x 2 ) =-2xi + x 2 


b. fix i,x 2 ) = xi - x 2 


Repeat Exercise 1 where m is the minimum value of / on S 
instead of the maximum value. 


4. Repeat Exercise 2 where m is the minimum value of / on & 
instead of the maximum value. 


In Exercises 5-8, find the minimal representation of the polytope 
defined by the inequalities Ax < b and x > 0. 


5. A = 


1 

3 



10 

15 




18 

16 

"18 
10 
_ 28 

" 8 " 

6 

7 


9. Let S = {(x, y) : x 2 + (y — l ) 2 < 1} U {(3,0)}. Is the ori¬ 
gin an extreme point of conv S ? Is the origin a vertex of 
conv S ? 


10. Find an example of a closed convex set S in M 2 such that its 
profile P is nonempty but conv P S . 

11. Find an example of a bounded convex set S in M 2 such that 
its profile P is nonempty but conv P S. 

12. a. Determine the number of k-faces of the 5-dimensional 

simplex S 5 for k = 0,1,..., 4. Verify that your answer 
satisfies Euler’s formula. 

b. Make a chart of the values of fk ( S n ) for n = 1,..., 5 and 
k = 0,1,..., 4. Can you see a pattern? Guess a general 
formula for fk ( S n ). 


3 A proof when n = 3 is presented in Steven R. Lay, Convex Sets and Their Applications (New York: 
John Wiley & Sons, 1982; Mineola, NY: Dover Publications, 2007), p. 131. 
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13. a. Determine the number of k -faces of the 5-dimensional 

hypercube C 5 for k = 0, 1,..., 4. Verify that your an¬ 
swer satisfies Euler’s formula. 

b. Make a chart of the values of fk (C 77 ) for n = 1,..., 5 and 
k — 0,1,..., 4. Can you see a pattern? Guess a general 
formula for fk(C n ). 

14. Suppose Vi,...,Vfc are linearly independent vectors in 

R /l (1 < k < ft). Then the set X k = conv {=tvi,..., ±v^} is 

called a A>crosspolytope. 

a. Sketch V 1 and X 2 . 

b. Determine the number of k-faces of the 3-dimensional 
crosspolytope V 3 for k — 0,1,2. What is another name 
for V 3 ? 

c. Determine the number of k-faces of the 4-dimensional 
crosspolytope X 4 for k = 0,1,2, 3. Verify that your an¬ 
swer satisfies Euler’s formula. 

d. Find a formula for fk (A 77 ), the number of k -faces of X n , 
for 0 < k < ft — 1. 

15. A A>pyramid P k is the convex hull of a (k — 1)-polytope 

Q and a point x f aff Q . Find a formula for each of the 

following in terms of fj(Q),j = 0,..., n — 1. 

a. The number of vertices of P n : fo(P n ). 

b. The number of k-faces of P n : fk(P n ), for 1 < k < n — 2. 

c. The number of (ft — 1)-dimensional facets of P n : 
fn-liP"). 

In Exercises 16 and 17, mark each statement True or False. Justify 

each answer. 

16. a. A poly tope is the convex hull of a finite set of points. 

b. Let p be an extreme point of a convex set S . If u, v G S , 
p G uv, and p/u, then p = v. 

c. If S is a nonempty convex subset of IR 77 , then S is the 
convex hull of its profile. 

d. The 4-dimensional simplex S 4 has exactly five facets, 
each of which is a 3-dimensional tetrahedron. 


17. a. 

b. 


A cube in IR 3 has exactly five facets. 


A point p is an extreme point of a polytope P if and only 
if p is a vertex of P . 


c. If S is a nonempty compact convex set and a linear 
functional attains its maximum at a point p, then p is an 
extreme point of S . 


d. A 2-dimensional poly tope always has the same number 
of vertices and edges. 


18. Let v be an element of the convex set S . Prove that v is an 
extreme point of S if and only if the set {x e S : x / v} is 
convex. 


19. If c G IR and S is a set, define cS = {cx : x G S}. Let S 
be a convex set and suppose c > 0 and d > 0. Prove that 
cS + dS = (c + d)S. 

20. Find an example to show that the convexity of S is necessary 
in Exercise 19. 


21. If A and B are convex sets, prove that A + B is convex. 

22. A polyhedron (3-polytope) is called regular if all its facets 
are congruent regular polygons and all the angles at the 
vertices are equal. Supply the details in the following proof 
that there are only five regular polyhedra. 

a. Suppose that a regular polyhedron has r facets, each of 
which is a k -sided regular polygon, and that s edges 
meet at each vertex. Letting v and e denote the numbers 
of vertices and edges in the polyhedron, explain why 
kr ~ 2e and sv = 2e. 

1111 

b. Use Euler’s formula to show that —b — = —I—. 

s k 2 e 

c. Find all the integral solutions of the equation in part 
(b) that satisfy the geometric constraints of the problem. 
(How small can k and s be?) 

For your information, the five regular polyhedra are the 
tetrahedron (4,6,4), the cube (8,12,6), the octahedron (6,12, 
8 ), the dodecahedron (20, 30, 12), and the icosahedron (12, 
30,20). (The numbers in parentheses indicate the numbers of 
vertices, edges, and faces, respectively.) 


SOLUTION TO PRACTICE PROBLEM 
The matrix inequality Ax < b yields the following system of inequalities: 

(a) x\ + 3x2 5 12 

(b) xi + 2x2 5 9 

(c) 2xi + X 2 < 12 

The condition x > 0, places the poly tope in the first quadrant of the plane. One vertex 
is (0,0). The X \ -intercepts of the three lines (when X 2 = 0) are 12,9, and 6, so (6, 0) is 
a vertex. The X 2 -intercepts of the three lines (when x\ = 0) are 4, 4.5, and 12, so (0, 4) 
is a vertex. 
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How do the three boundary lines intersect for positive values of x\ and X 2 ? The 
intersection of (a) and (b) is at p ab = (3, 3). Testing p ab in (c) gives 2(3) + 1(3) = 
9 < 12, so p ab is in P. The intersection of (b) and (c) is at p bc = (5, 2). Testing p bc 
in (a) gives 1(5) + 3(2) = 11 < 12, so p bc is in P. The intersection of (a) and (c) is at 
p ac = (4.8, 2.4). Testing p ac in (b) gives 1(4.8) + 2(2.4) = 9.6 > 9. So p ac is not in P. 

Finally, the five vertices (extreme points) of the poly tope are (0,0), (6,0), (5,2) 
(3,3), and (0, 4). These points form the minimal representation of P . This is displayed 
graphically in Figure 13. 


8.6 CURVES AND SURFACES 


*2 



For thousands of years, builders used long thin strips of wood to create the hull of a boat. 
In more recent times, designers used long, flexible metal strips to lay out the surfaces of 
cars and airplanes. Weights and pegs shaped the strips into smooth curves called natural 
cubic splines. The curve between two successive control points (pegs or weights) has 
a parametric representation using cubic polynomials. Unfortunately, such curves have 
the property that moving one control point affects the shape of the entire curve, because 
of physical forces that the pegs and weights exert on the strip. Design engineers had 
long wanted local control of the curve—in which movement of one control point would 
affect only a small portion of the curve. In 1962, a French automotive engineer, Pierre 
Bezier, solved this problem by adding extra control points and using a class of curves 
now called by his name. 


Bezier Curves 

The curves described below play an important role in computer graphics as well as 
engineering. For example, they are used in Adobe Illustrator and Macromedia Freehand, 
and in application programming languages such as OpenGL. These curves permit a 
program to store exact information about curved segments and surfaces in a relatively 
small number of control points. All graphics commands for the segments and surfaces 
have only to be computed for the control points. The special structure of these curves 
also speeds up other calculations in the “graphics pipeline” that creates the final display 
on the viewing screen. 

Exercises in Section 8.3 introduced quadratic Bezier curves and showed one method 
for constructing Bezier curves of higher degree. The discussion here focuses on quadratic 
and cubic Bezier curves, which are determined by three or four control points, denoted 
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by p 0 ,Pi,p 2 , and p 3 . These points can be in M 2 or M 3 , or they can be represented by 
homogeneous forms in M 3 or M 4 . The standard parametric descriptions of these curves, 
for 0 < t < 1, are 

w(0 = (1 -0 2 Po + 2f(l -OPi +t 2 V 2 (1) 

x (0 = (1 - 0 3 Po + 3f (1 - 0 2 Pi + 3f 2 (l - 1) p 2 + f 3 p 3 (2) 

Figure 1 shows two typical curves. Usually, the curves pass through only the initial and 
terminal control points, but a Bezier curve is always in the convex hull of its control 
points. (See Exercises 21-24 in Section 8.3.) 




curves. 


Bezier curves are useful in computer graphics because their essential properties are 
preserved under the action of linear transformations and translations. For instance, if 
A is a matrix of appropriate size, then from the linearity of matrix multiplication, for 
0 < t < 1 , 

Ax(t) = A[{ 1 - 0 3 Po + 3*(1 - 0 2 Pi + 3r 2 (l - t)p 2 + ^ 3 ] 

= (1 — t) 3 Ap 0 + 3f(l — t) 2 Ap { + 3^ 2 (1 — t)A\) 2 + £ 3 v4p 3 

The new control points are v4p 0 ,..., v4p 3 . Translations of Bezier curves are considered 
in Exercise 1. 

The curves in Figure 1 suggest that the control points determine the tangent lines 
to the curves at the initial and terminal control points. Recall from calculus that for any 
parametric curve, say y (t), the direction of the tangent line to the curve at a point y (t) 
is given by the derivative called the tangent vector of the curve. (This derivative 
is computed entry by entry.) 


EXAMPLE 1 Determine how the tangent vector of the quadratic Bezier curve w(£) 
is related to the control points of the curve, at t = 0 and t = 1. 

SOLUTION Write the weights in equation (1) as simple polynomials 

w(0 = (1 - 2t + t 2 ) p 0 + (2 1 - 2t 2 )p { + £ 2 p 2 

Then, because differentiation is a linear transformation on functions, 

w 7 (0 = (-2 + 20p 0 + (2 - 4f)Pi + 2^p 2 

So 

w 7 (0) = —2p 0 + 2p { = 2(pj - p 0 ) 
w 7 (l) = 2pi + 2p 2 = 2(p 2 - pj) 

The tangent vector at p 0 , for instance, points from p 0 to pj, but it is twice as long as the 
segment from p 0 to p P Notice that w r (0) = 0 when = p 0 . In this case, 
w (0 = (1 -? 2 )Pi + / 2 p 2 , and the graph of w (t) is the line segment from p, 
top 2 . ■ 
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Connecting Two Bezier Curves 

Two basic Bezier curves can be joined end to end, with the terminal point of the first 
curve x(7) being the initial point p 2 of the second curve y (t). The combined curve is 
said to have G° geometric continuity (at p 2 ) because the two segments join at p 2 . If the 
tangent line to curve 1 at p 2 has a different direction than the tangent line to curve 2, 
then a “corner,” or abrupt change of direction, may be apparent at p 2 . See Figure 2. 



To avoid a sharp bend, it usually suffices to adjust the curves to have what is called 
G 1 geometric continuity , where both tangent vectors at p 2 point in the same direction. 
That is, the derivatives x'(l) and y^O) point in the same direction, even though their 
magnitudes may be different. When the tangent vectors are actually equal at p 2 , the 
tangent vector is continuous at p 2 , and the combined curve is said to have C 1 continuity, 
or C 1 parametric continuity. Figure 3 shows G 1 continuity in (a) and C 1 continuity 
in (b). 




FIGURE 3 (a) G 1 continuity and (b) C 1 continuity. 


EXAMPLE 2 Let x(t) and y(^) determine two quadratic Bezier curves, with control 

points {p 0 , Pi, p?} and {p ? , p 3 , p 4 }, respectively. The curves are joined at p 2 = x(l) = 

y(0). 

a. Suppose the combined curve has G 1 continuity (at p 2 ). What algebraic restriction 
does this condition impose on the control points? Express this restriction in geomet¬ 
ric language. 

b. Repeat part (a) for C 1 continuity. 

SOLUTION 

a. From Example l,x'(l) = 2(p 2 — p x ). Also, using the control points for y(^) in place 
of w (t), Example 1 shows that y 7 (0) = 2(p 3 — p 2 ). G 1 continuity means that 
y^O) = kx\ 1) for some positive constant k. Equivalently, 

p 3 - p 2 = k{ p 2 - pO, with k > 0 


(3) 
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Geometrically, (3) implies that p 2 lies on the line segment from to p 3 . To 
prove this, let t = (k + l) -1 , and note that 0 < t < 1. Solve for k to obtain 
k = (1 — t)/t. When this expression is used for k in (3), a rearrangement shows 
that p 2 = (1 — OPi + ^P 3 > which verifies the assertion about p 2 . 

b. C 1 continuity means that y^O) = x r (l). Thus 2(p 3 — p 2 ) = 2(p 2 — pj), so 
p 3 — p 2 = P 2 — Pi > and p 2 = (p 1 + p 3 )/2. Geometrically, p 2 is the midpoint of the 
line segment from to p 3 . See Figure 3. ■ 


Figure 4 shows C 1 continuity for two cubic Bezier curves. Notice how the point 
joining the two segments lies in the middle of the line segment between the adjacent 
control points. 



FIGURE 4 Two cubic Bezier curves. 


Two curves have C 2 (parametric) continuity when they have C 1 continuity and the 
second derivatives x"(l) and y"(0) are equal. This is possible for cubic Bezier curves, 
but it severely limits the positions of the control points. Another class of cubic curves, 
called B-splines, always have C 2 continuity because each pair of curves share three 
control points rather than one. Graphics figures using B-splines have more control points 
and consequently require more computations. Some exercises for this section examine 
these curves. 

Surprisingly, if x(t) and y(t) join at p 3 , the apparent smoothness of the curve at 
p 3 is usually the same for both G 1 continuity and C 1 continuity. This is because the 
magnitude of x'(t) is not related to the physical shape of the curve. The magnitude 
reflects only the mathematical parameterization of the curve. For instance, if a new 
vector function z (t) equals x(2 1), then the point z (t) traverses the curve from p 0 to 
p 3 twice as fast as the original version, because 2 1 reaches 1 when t is .5. But, by the 
chain rule of calculus, z'(t) = 2 • x'(2t), so the tangent vector to z (t) at p 3 is twice the 
tangent vector to x(^) at p 3 . 

In practice, many simple Bezier curves are joined to create graphics objects. Type¬ 
setting programs provide one important application, because many letters in a type font 
involve curved segments. Each letter in a PostScript® font, for example, is stored as a 
set of control points, along with information on how to construct the “outline” of the 
letter using line segments and Bezier curves. Enlarging such a letter basically requires 
multiplying the coordinates of each control point by one constant scale factor. Once the 
outline of the letter has been computed, the appropriate solid parts of the letter are filled 
in. Figure 5 illustrates this for a character in a PostScript font. Note the control points. 
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FIGURE 5 A PostScript character. 



Matrix Equations for Bezier Curves 

Since a Bezier curve is a linear combination of control points using polynomials as 
weights, the formula for x(t) may be written as 


x(0 = [p 0 Pi P2 P3 ] 


(i-0 


3/(1 - 
3/ 2 (l 


0 


0 


/ 


[Po Pi P 2 Psl 


1 — 3t + 3t 2 — t 
31 — 6t 2 3t 3 

3t 2 — 3t 3 

3 


[PO Pi P2 P 3 ] 


1 

0 

0 

0 


t 

3 3 
3 -6 
0 3 
0 0 


1 

3 

3 

1 


1 

t 


t 

t 


The matrix whose columns are the four control points is called a geometry matrix, G. 
The 4 x 4 matrix of polynomial coefficients is the Bezier basis matrix, Mb . If u(7) is 
the column vector of powers of t , then the Bezier curve is given by 


x(t) = GM B u(t ) 


(4) 


Other parametric cubic curves in computer graphics are written in this form, too. For 
instance, if the entries in the matrix Mb are changed appropriately, the resulting curves 
are B-splines. They are “smoother” than Bezier curves, but they do not pass through any 
of the control points. A Hermite cubic curve arises when the matrix Mb is replaced by 
a Hermite basis matrix. In this case, the columns of the geometry matrix consist of the 
starting and ending points of the curves and the tangent vectors to the curves at those 
points. 1 

The Bezier curve in equation (4) can also be “factored” in another way, to be used 
in the discussion of Bezier surfaces. For convenience later, the parameter t is replaced 


1 The term basis matrix comes from the rows of the matrix that list the coefficients of the blending poly¬ 

nomials used to define the curve. For a cubic Bezier curve, the four polynomials are (1 — t) 3 ,3t(l — t ) 2 , 

3/ 2 (l — t), and t 3 . They form a basis for the space P 3 of polynomials of degree 3 or less. Each entry in the 
vector x(t) is a linear combination of these polynomials. The weights come from the rows of the geometry 
matrix G in (4). 
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by a parameter s : 




= [(1 — s ) 3 3^(1 — ^) 2 3 s 2 ( 1 — s) s 3 ] 


Po 

Pi 

P2 

P3 




This formula is not quite the same as the transpose of the product on the right of 
(4), because x(s) and the control points appear in (5) without transpose symbols. The 
matrix of control points in (5) is called a geometry vector. This should be viewed as a 
4x1 block (partitioned) matrix whose entries are column vectors. The matrix to the left 
of the geometry vector, in the second part of (5), can be viewed as a block matrix, too, 
with a scalar in each block. The partitioned matrix multiplication makes sense, because 
each (vector) entry in the geometry vector can be left-multiplied by a scalar as well as 
by a matrix. Thus, the column vector x(s) is represented by (5). 


Bezier Surfaces 

A 3D bicubic surface patch can be constructed from a set of four Bezier curves. Consider 
the four geometry matrices 


[Pll 

Pl2 

Pl3 

P 14 ] 

[P 21 

P22 

P23 

P 24 ] 

[P31 

P32 

P33 

P 34 ] 

[P41 

P42 

P43 

P 44 ] 


and recall from equation (4) that a Bezier curve is produced when any one of these 
matrices is multiplied on the right by the following vector of weights: 



(i-0 3 

3/(1 -t ) 2 
3/ 2 (l — /) 


Let G be the block (partitioned) 4x4 matrix whose entries are the control points p / ■ 
displayed above. Then the following product is a block 4x1 matrix, and each entry is 
a Bezier curve: 


In fact, 


GM B u(t ) = 


P 11 

Pl2 

Pl3 

P 14 


' (i-/) 3 ' 

P 21 

P22 

P23 

P 24 


3/(1 —/)2 

P 31 

P32 

P33 

P 34 


3/2(1 -/) 

_ P 41 

P42 

P43 

P 44 _ 


t 3 


(1 - r) 3 Pn + 3/(1 - t) 2 P 12 + 3/ 2 (l - OP 13 + * 3 Pi4 

(1 - /) 3 p 2 i + 3/(1 - t) 2 p 22 + 3/ 2 (l - /) p 23 + / 3 p 24 

(1 - /) 3 p 31 + 3/(1 - /) 2 p 32 + 3/ 2 (l - /)p 33 + / 3 p 34 

(1 - /) 3 p 41 + 3/(1 - /) 2 p 42 + 3/2(1 - /)p 43 + / 3 p 44 


GM b u (£) 
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Now fix t. Then GM B u(t ) is a column vector that can be used as a geometry vector 
in equation (5) for a Bezier curve in another variable s. This observation produces the 

Bezier bicubic surface: 


x(s, t ) = u(s) T M^GM B u(t), where 0 < s, t < 1 


( 6 ) 


The formula for x(s,t) is a linear combination of the sixteen control points. If one 
imagines that these control points are arranged in a fairly uniform rectangular array, as 
in Figure 6, then the Bezier surface is controlled by a web of eight Bezier curves, four 
in the “s-direction” and four in the “^-direction.” The surface actually passes through 
the four control points at its “corners.” When it is in the middle of a larger surface, the 
sixteen-point surface shares its twelve boundary control points with its neighbors. 


P 


21 


P 


11 



14 


FIGURE 6 Sixteen control points for a Bezier 
bicubic surface patch. 


Approximations to Curves and Surfaces 

In CAD programs and in programs used to create realistic computer games, the designer 
often works at a graphics workstation to compose a “scene” involving various geometric 
structures. This process requires interaction between the designer and the geometric ob¬ 
jects. Each slight repositioning of an object requires new mathematical computations by 
the graphics program. Bezier curves and surfaces can be useful in this process because 
they involve fewer control points than objects approximated by many polygons. This 
dramatically reduces the computation time and speeds up the designer’s work. 

After the scene composition, however, the final image preparation has different 
computational demands that are more easily met by objects consisting of flat surfaces 
and straight edges, such as polyhedra. The designer needs to render the scene, by in¬ 
troducing light sources, adding color and texture to surfaces, and simulating reflections 
from the surfaces. 

Computing the direction of a reflected light at a point p on a surface, for instance, 
requires knowing the directions of both the incoming light and the surface normal— 
the vector perpendicular to the tangent plane at p. Computing such normal vectors is 
much easier on a surface composed of, say, tiny flat polygons than on a curved surface 
whose normal vector changes continuously as p moves. If p 1? p 2 , and p 3 are adjacent 
vertices of a flat polygon, then the surface normal is just plus or minus the cross product 
(p 2 — Pi) x (p 2 — p 3 ) . When the polygon is small, only one normal vector is needed for 
rendering the entire polygon. Also, two widely used shading routines, Gouraud shading 
and Phong shading, both require a surface to be defined by polygons. 

As a result of these needs for flat surfaces, the Bezier curves and surfaces from the 
scene composition stage now are usually approximated by straight line segments and 
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polyhedral surfaces. The basic idea for approximating a Bezier curve or surface is to 
divide the curve or surface into smaller pieces, with more and more control points. 


Recursive Subdivision of Bezier Curves and Surfaces 

Figure 7 shows the four control points p 0 ,..., p 3 for a Bezier curve, along with control 
points for two new curves, each coinciding with half of the original curve. The “left” 
curve begins at q 0 = p 0 and ends at q 3 , at the midpoint of the original curve. The “right” 
curve begins at ro = q 3 and ends at r 3 = p 3 . 


/ 

Po = ( lo 

FIGURE 7 Subdivision of a Bezier curve. 

Figure 8 shows how the new control points enclose regions that are “thinner” than 
the region enclosed by the original control points. As the distances between the control 
points decrease, the control points of each curve segment also move closer to a line 
segment. This variation-diminishing property of Bezier curves depends on the fact that 
a Bezier curve always lies in the convex hull of the control points. 




The new control points are related to the original control points by simple formulas. 
Of course, q 0 = p 0 and r 3 = p 3 . The midpoint of the original curve x(t) occurs at x(.5) 
when x(t) has the standard parameterization, 

x(t) = (1 — 3t + 3 1 2 — t 3 ) p 0 + (3 1 — 6t 2 + 3t 3 )p { + (3 1 2 — 3t 3 )p 2 + t 3 p 3 (7) 
for 0 < t < 1. Thus, the new control points q 3 and ro are given by 

q 3 = r 0 =x(.5) = |(p 0 + 3p!+3p 2 + p 3 ) (8) 

The formulas for the remaining “interior” control points are also simple, but the deriva¬ 
tion of the formulas requires some work involving the tangent vectors of the curves. By 
definition, the tangent vector to a parameterized curve x(t) is the derivative x'(t). This 
vector shows the direction of the line tangent to the curve at x(t). For the Bezier curve 
in (7), 

x'(t) = (—3 + 6t — 3^ 2 )p 0 + (3 — 12^ + 9 t 2 )p x + ( 6t — 9t 2 )p 2 + 3^ 2 p 3 

for 0 < t < 1. In particular, 

x'(0) = 3(pj - p 0 ) and x'(l) = 3(p 3 - p 2 ) 


(9) 
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Geometrically, p 1 is on the line tangent to the curve at p 0 , and p 2 is on the line tangent 
to the curve at p 3 . See Figure 8. Also, from x\t), compute 

x '(-5) = j(—Po — Pi +P 2 + P 3 ) ( 10 ) 

Let y(t) be the Bezier curve determined by q 0 ,..., q 3 , and let z(t) be the Bezier curve 
determined by ro,..., r 3 . Since y(t) traverses the same path as x(t) but only gets to 
x(.5) as t goes from 0 to 1, y(t) = x(.5t) for 0 < t < 1. Similarly, since z (t) starts at 
x(.5) when t — 0, z (t) = x(.5 + .5t) for 0 < t < 1. By the chain rule for derivatives, 

y'(t) = .5x'(.5t) and z '(t) = .5x r (.5 + .5t) forO < t < 1 (11) 

From (9) with y^O) in place of x^O), from (11) with t = 0, and from (9), the control 
points for y(^) satisfy 

3(Qi - q 0 ) = y'(0) = .5x'(0) = |(p x - p 0 ) (12) 

From (9) with y'(l) in place of x'(l), from (11) with t = 1, and from (10), 

3(q 3 - q 2 ) = y'(l) = -5 x'(. 5) = |(—Po - Pi + P 2 + P 3 ) (13) 

Equations (8), (9), (10), (12), and (13) can be solved to produce the formulas for q 0 ,..., 
q 3 shown in Exercise 13. Geometrically, the formulas are displayed in Figure 9. The 
interior control points q { and r 2 are the midpoints, respectively, of the segment from p 0 
to \) { and the segment from p 2 to p 3 . When the midpoint of the segment from p { to p 2 
is connected to , the resulting line segment has q 2 in the middle! 




FIGURE 9 Geometric structure of new control points. 



This completes one step of the subdivision process. The “recursion” begins, and 
both new curves are subdivided. The recursion continues to a depth at which all curves 
are sufficiently straight. Alternatively, at each step the recursion can be “adaptive” and 
not subdivide one of the two new curves if that curve is sufficiently straight. Once the 
subdivision completely stops, the endpoints of each curve are joined by line segments, 
and the scene is ready for the next step in the final image preparation. 

A Bezier bicubic surface has the same variation-diminishing property as the Bezier 
curves that make up each cross-section of the surface, so the process described above 
can be applied in each cross-section. With the details omitted, here is the basic strategy. 
Consider the four “parallel” Bezier curves whose parameter is s, and apply the subdi¬ 
vision process to each of them. This produces four sets of eight control points; each set 
determines a curve as s varies from 0 to 1. As t varies, however, there are eight curves, 
each with four control points. Apply the subdivision process to each of these sets of 
four points, creating a total of 64 control points. Adaptive recursion is possible in this 
setting, too, but there are some subtleties involved. 2 


2 See Foley, van Dam, Feiner, and Hughes, Computer Graphics—Principles and Practice , 2nd Ed. (Boston: 
Addison-Wesley, 1996), pp. 527-528. 
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PRACTICE PROBLEMS 


A spline usually refers to a curve that passes through specified points. A B-spline, 
however, usually does not pass through its control points. A single segment has the 
parametric form 


x(0 = |[(1 - 0 3 Po + ( 3f3 - 6f2 + 4 )Pl 

+ (—3 1~ + 3 1~ + 3t + 1)P2 “I - 1 P 3 ] 



for 0 < t < 1, where p 0 , Pi, p 2 , and p 3 are the control points. When t varies from 0 to 1, 
x(t) creates a short curve that lies close to p/p/\ Basic algebra shows that the B-spline 
formula can also be written as 


x(t) 


I[(l - t ) 3 Po + (3?(1 - 0 1 2 -3 1 + 4)P! 

+ (3 1~( 1 — f) + 3? + 1)P2 + ? 3 p 3 ] 


(15) 


This shows the similarity with the Bezier curve. Except for the 1/6 factor at the front, 
the Po and p 3 terms are the same. The component has been increased by —3 1 + 4 
and the p 2 component has been increased by 3 1 + 1. These components move the curve 
closer to p/p/ than the Bezier curve. The 1/6 factor is necessary to keep the sum of the 
coefficients equal to 1. Figure 10 compares a B-spline with a Bezier curve that has the 
same control points. 




FIGURE 10 A B-spline segment and a Bezier curve. 


1. Show that the B-spline does not begin at p 0 , but x(0) is in conv {p 0 , , p 2 }. Assum¬ 

ing that p 0 , Pi, and p 2 are affinely independent, find the affine coordinates of x(0) 
with respect to {p 0 , Pi, p 2 }. 

2. Show that the B-spline does not end at p 3 , but x(l) is in conv {p x , p 2 , p 3 }. Assuming 
that P!, p 2 , and p 3 are affinely independent, find the affine coordinates of x(l) with 

respect to {PLP 2 P 3 } 


8.6 EXERCISES 

1. Suppose a Bezier curve is translated to x(t) + b. That is, for 
0 < t < 1, the new curve is 

x(0 = (1 - 0 3 Po + 3t(l - 0 2 Pi 

+ 3r 2 (l — r)p 2 + r 3 p 3 + b 

Show that this new curve is again a Bezier curve. [Hint: 
Where are the new control points?] 

2. The parametric vector form of a B-spline curve was defined 
in the Practice Problems as 

x(o = Ha - o 3 p 0 + ( 3f d - o 2 - 3 1 + 4)pj 

+(3^ 2 (1 — t) + 3t + l)p 2 + / l 3 p 3 ] for 0 < t < 1, 
where p 0 , Pi, p 2 , and p 3 are the control points. 


a. Show that for 0 < t < 1, x(t) is in the convex hull of the 
control points. 

b. Suppose that a B-spline curve x(t) is translated to 
x(/) + b (as in Exercise 1). Show that this new curve is 
again a B-spline. 

3. Let x(t) be a cubic Bezier curve determined by points p 0 , Pi, 
p 2 , and p 3 . 

a. Compute the tangent vector x\t). Determine how x^O) 
and x^l) are related to the control points, and give ge¬ 
ometric descriptions of the directions of these tangent 
vectors. Is it possible to have x ; (l) = 0? 

b. Compute the second derivative x"{t ) and determine how 
x"(0) and x"(l) are related to the control points. Draw a 
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figure based on Figure 10, and construct a line segment 
that points in the direction of x"(0). [Hint: Use as the 
origin of the coordinate system.] 

4. Let x(t) be the B-spline in Exercise 2, with control points p 0 , 
p 1 ,p 2 ,and p 3 . 

a. Compute the tangent vector x' (t) and determine how 
the derivatives x^O) and x'(l) are related to the control 
points. Give geometric descriptions of the directions of 
these tangent vectors. Explore what happens when both 
x^O) and x'(l) equal 0. Justify your assertions. 

b. Compute the second derivative x"{t) and determine how 
x" (0) and x" (1) are related to the control points. Draw a 
figure based on Figure 10, and construct a line segment 
that points in the direction of x"(l). [Hint: Use p 2 as the 
origin of the coordinate system.] 

5. Let x(t) and y (t) be cubic Bezier curves with control points 

{P0P1P2P3} and jp 3 , p 4 , P5? Pg}? respectively, so that x(Y) 
and y (t) are joined at p 3 . The following questions refer to 
the curve consisting of x(t) followed by y(t). For simplicity, 
assume that the curve is in IR 2 . 

a. What condition on the control points will guarantee that 
the curve has C 1 continuity at p 3 ? Justify your answer. 

b. What happens when x'(l) and y^O) are both the zero 
vector? 

6. A B-spline is built out of B-spline segments, described in 
Exercise 2. Let p 0 ,..., p 4 be control points. For 0 < t < 1, 
let x(t) and y (t) be determined by the geometry matrices 

[p 0 Pi P2 p 3 ] and [ Pi P2 p 3 p 4 ]> respectively. 
Notice how the two segments share three control points. 
The two segments do not overlap, however—they join at a 
common endpoint, close to p 2 . 

a. Show that the combined curve has G 0 continuity — that is, 
x(l) = y(0). 

b. Show that the curve has C 1 continuity at the join point, 
x(l). That is, show that x'(l) = y^O). 

7. Let x(t ) and y(7) be Bezier curves from Exercise 5, and sup¬ 
pose the combined curve has C 2 continuity (which includes 
C 1 continuity) at p 3 . Set x"(l) = y"(0) and show that p 5 is 
completely determined by p l9 p 2 , and p 3 . Thus, the points 
p 0 ,..., p 3 and the C 2 condition determine all but one of the 
control points for y (t). 

8. Let x(t) and y (t) be segments of a B-spline as in 
Exercise 6. Show that the curve has C 2 continuity (as well 
as C 1 continuity) at x(l). That is, show that x"(l) = y /r (0). 
This higher-order continuity is desirable in CAD applica¬ 
tions such as automotive body design, since the curves and 
surfaces appear much smoother. However, B-splines require 
three times the computation of Bezier curves, for curves 
of comparable length. For surfaces, B-splines require nine 
times the computation of Bezier surfaces. Programmers often 
choose Bezier surfaces for applications (such as an airplane 
cockpit simulator) that require real-time rendering. 


9. A quartic Bezier curve is determined by five control points, 
P 0 .P 1 .P 2 P 3 . an d P 4 : 

x(0 = (1 - ?) 4 Po + 4K 1 - ? ) 3 Pi + 6 r 2 (l - 0 2 P 2 

+ 4r 3 (l — r)p 3 + Cp 4 for 0 < t < 1 

Construct the quartic basis matrix M B for x{t). 

10. The “B” in B-spline refers to the fact that a segment x(t) may 
be written in terms of a basis matrix, Ms, in a form similar 
to a Bezier curve. That is, 

x(t) = GM s u(t) for 0 < t < 1 

where G is the geometry matrix [ p 0 p x p 2 p 3 ] and u (t) 
is the column vector (1 ,t, t 2 , t 3 ). In a uniform B-spline, each 
segment uses the same basis matrix, but the geometry matrix 
changes. Construct the basis matrix M s for x(t). 

In Exercises 11 and 12, mark each statement True or False. Justify 
each answer. 

11. a. The cubic Bezier curve is based on four control points. 

b. Given a quadratic Bezier curve x(t) with control points 
p 0 , pj, and p 2 , the directed line segment — p 0 (from 
p 0 to pj) is the tangent vector to the curve at p 0 . 

c. When two quadratic Bezier curves with control points 

{p 0 , Pi> P 2 } an d {P 2 ? P 3 5 P 4 } are joined at p 2 ,the combined 
Bezier curve will have C 1 continuity at p 2 if p 2 is the 
midpoint of the line segment between p! and p 3 . 

12. a. The essential properties of Bezier curves are preserved 

under the action of linear transformations, but not 
translations. 

b. When two Bezier curves x(^) and y (t) are joined at the 
point where x(l) = y( 0 ), the combined curve has G° 
continuity at that point. 

c. The Bezier basis matrix is a matrix whose columns are 
the control points of the curve. 

Exercises 13-15 concern the subdivision of a Bezier curve shown 
in Figure 7. Let x(t) be the Bezier curve, with control points 
p 0 ,..., p 3 , and let y (t) and z(t) be the subdividing Bezier curves 
as in the text, with control points q 0 , ..., q 3 and r 0 ,..., r 3 , 
respectively. 

13. a. Use equation (12) to show that q 1 is the midpoint of the 

segment from p 0 to pj. 

b. Use equation (13) to show that 

8q 2 = 8q 3 + P 0 + P, - P 2 - P 3 

c. Use part (b), equation ( 8 ), and part (a) to show that q 2 is 
the midpoint of the segment from qj to the midpoint of the 
segment from p l to p 2 . That is, q 2 = Ifai + l(Pi +P 2 )]' 

14. a. Justify each equal sign: 

3(r 3 - r 2 ) = z'(l) = .5x'(l) = §(p 3 - p 2 ). 
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b. Show that r 2 is the midpoint of the segment from p 2 to p 3 . 

c. Justify each equal sign: 3(i*i — r 0 ) = z^O) = .5x'(.5). 

d. Use part (c) to show that 8ri = —p 0 — Pi + p 2 + p 3 + 

8r 0 . 

e. Use part (d), equation (8), and part (a) to show that ri is 
the midpoint of the segment from r 2 to the midpoint of the 
segment from to p 2 . That is, iq = 1 [r 2 + 2 (pi + p 2 )i • 

15. Sometimes only one half of a Bezier curve needs further 
subdividing. For example, subdivision of the “left” side is 
accomplished with parts (a) and (c) of Exercise 13 and 
equation (8). When both halves of the curve x(t) are divided, 
it is possible to organize calculations efficiently to calculate 
both left and right control points concurrently, without using 
equation (8) directly. 

a. Show that the tangent vectors y'(l) and z^O) are equal. 

b. Use part (a) to show that q 3 (which equals r 0 ) is the 
midpoint of the segment from q 2 to iq. 

c. Using part (b) and the results of Exercises 13 and 14, write 
an algorithm that computes the control points for both 
y (t) and z (t) in an efficient manner. The only operations 
needed are sums and division by 2. 

16. Explain why a cubic Bezier curve is completely determined 
by x(0), x'(0), x(l), and x'(l). 

SOLUTIONS TO PRACTICE PROBLEMS 


1. From equation (14) with t = 0, x(0) 7 ^ p 0 because 

x(0) = fiPo + 4pj + p 2 ] = ^P 0 + |Pi + \vi- 

The coefficients are nonnegative and sum to 1, so x(0) is in conv {p 0 , p 1? p 2 }, and 
the affine coordinates with respect to {p 0 , p 1? p 2 } are (|, |, ^). 

2. From equation (14) with t = 1, x(l) 7 ^ p 3 because 

x 0 ) = g[Pi + 4p 2 + p 3 ] = 5 P 1 + fp 2 + gP 3 . 

The coefficients are nonnegative and sum to 1, so x(l) is in conv {p 1? p 2 , p 3 }, and 
the affine coordinates with respect to {Pi, p 2 , p 3 } are (|, |, ^). 


17. TrueType® fonts, created by Apple Computer and Adobe 
Systems, use quadratic Bezier curves, while PostScript® 
fonts, created by Microsoft, use cubic Bezier curves. The 
cubic curves provide more flexibility for typeface design, 
but it is important to Microsoft that every typeface using 
quadratic curves can be transformed into one that uses cubic 
curves. Suppose that w (t) is a quadratic curve, with control 
points p 0 , Pi, and p 2 . 

a. Find control points r 0 , iq, r 2 , and r 3 such that the cubic 
Bezier curve x(t) with these control points has the prop¬ 
erty that x(t) and w (t) have the same initial and terminal 
points and the same tangent vectors at t = 0 and t — 1. 
(See Exercise 16.) 

b. Show that if x(t) is constructed as in part (a), then 
x(t) = w (t) for 0 < t < 1. 

18. Use partitioned matrix multiplication to compute the follow¬ 
ing matrix product, which appears in the alternative formula 
(5) for a Bezier curve: 


1 

0 

0 

0 " 


"Po“ 

-3 

3 

0 

0 


Pi 

3 

-6 

3 

0 


P 2 

-1 

3 

-3 

1 


_ P 3 _ 
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Uniqueness of the Reduced 
Echelon Form 


Uniqueness of the Reduced Echelon Form 

Each m x n matrix A is row equivalent to a unique reduced echelon matrix U . 


PROOF The proof uses the idea from Section 4.3 that the columns of row-equivalent 
matrices have exactly the same linear dependence relations. 

The row reduction algorithm shows that there exists at least one such matrix U . 
Suppose that A is row equivalent to matrices U and V in reduced echelon form. The 
leftmost nonzero entry in a row of U is a “leading 1.” Call the location of such a leading 
1 a pivot position, and call the column that contains it a pivot column. (This definition 
uses only the echelon nature of U and V and does not assume the uniqueness of the 
reduced echelon form.) 

The pivot columns of U and V are precisely the nonzero columns that are not 
linearly dependent on the columns to their left. (This condition is satisfied automatically 
by a first column if it is nonzero.) Since U and V are row equivalent (both being row 
equivalent to A), their columns have the same linear dependence relations. Hence, the 
pivot columns of U and V appear in the same locations. If there are r such columns, 
then since U and V are in reduced echelon form, their pivot columns are the first r 
columns of the m x m identity matrix. Thus, corresponding pivot columns ofU and V 
are equal. 

Finally, consider any nonpivot column of U , say column j. This column is either 
zero or a linear combination of the pivot columns to its left (because those pivot columns 
are a basis for the space spanned by the columns to the left of column j ). Either case 
can be expressed by writing Ux = 0 for some x whose y th entry is 1. Then Vx = 0, 
too, which says that column j of V is either zero or the same linear combination of the 
pivot columns of V to its left. Since corresponding pivot columns of U and V are equal, 
columns j of U and V are also equal. This holds for all nonpivot columns, so K = U, 
which proves that U is unique. 
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A complex number is a number written in the form 

z = a + bi 

where a and b are real numbers and i is a formal symbol satisfying the relation i 2 = — 1. 
The number a is the real part of z , denoted by Re z , and b is the imaginary part of 

Z, denoted by Im z. Two complex numbers are considered equal if and only if their 

real and imaginary parts are equal. For example, if z = 5 + (—2)/, then Rez = 5 and 
Imz = —2. For simplicity, we write z = 5 — 2/. 

A real number a is considered as a special type of complex number, by identifying 
a with a + Oi. Furthermore, arithmetic operations on real numbers can be extended to 
the set of complex numbers. 

The complex number system, denoted by C, is the set of all complex numbers, 
together with the following operations of addition and multiplication: 

{a T bi') T (c T di ) = (a T c ) T (b H- d )/ (1) 

(a + bi)(c + di) = (ac — bd ) + {ad + bc)i (2) 

These rules reduce to ordinary addition and multiplication of real numbers when 
b and d are zero in (1) and (2). It is readily checked that the usual laws of arithmetic 
for R also hold for C. For this reason, multiplication is usually computed by algebraic 
expansion, as in the following example. 

EXAMPLE 1 (5 - 20(3 + Ai) = 15 + 20 i - 6 i - 8 i 2 

= 15+ 14 i — 8 (— 1) 

= 23 + 14/ 

That is, multiply each term of 5 — 2 i by each term of 3 + 4 i , use i 2 = — 1, and write 
the result in the form a + bi . ■ 

Subtraction of complex numbers Z \ and Zi is defined by 

Zl -Z2 = Zl + (~1)Z2 
In particular, we write — z in place of (— l)z. 
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The conjugate of z = a + bi is the complex number z (read as “z bar”), defined 


z = a — bi 


Obtain z from z by reversing the sign of the imaginary part. 


EXAMPLE 2 The conjugate of —3 + M is —3 — 4/; write —3 + M — — 3 — 4/. 


Observe that if z = a + bi , then 

zz = (a + bi)(a — bi) = a 2 — abi + bai — b 2 i 2 = a 2 + b 2 (3) 

Since zz is real and nonnegative, it has a square root. The absolute value (or modulus) 
of z is the real number |z | defined by 


z 


Vzz = Va 2 + b 2 


If Z is a real number, then z = a + 0/, and |z| = Va 2 , which equals the ordinary 
absolute value of a . 

Some useful properties of conjugates and absolute value are listed below; w and z 
denote complex numbers. 


1. z = z if and only if z is a real number. 

2 . w + z = w + z. 

3. Wz =Wz \ in particular, rz = rz if r is a real number. 


4. zz = 


z 

2 > 0. 


5. 

wz 



w 

z 

• 


6. 

w + z 

< 

w 

+ 

z 


If Z 7 ^ 0, then |z| > 0 and z has a multiplicative inverse, denoted by 1/z or z 
and given by 


-l 



Of course, a quotient w/z simply means w • (1/z). 


EXAMPLE 3 Let w = 3 + 4/ and z = 5 — 2 i. Compute zz, |z |, and w/z 


SOLUTION From equation (3), 


zz = 5 2 + (-2) 


25 + 4 = 29 


For the absolute value, z 


Vzz = V29. To compute w/z, first multiply both the 


numerator and the denominator by z, the conjugate of the denominator. Because of (3), 





































A4 


APPENDIX B 


Complex Numbers 


this eliminates the / in the denominator: 

w 3 + 4 / 
z 5 — 2 i 

_ 3 + 4 / 5 + 2 / 

~ 5 — 2i * 5 + 2/ 

_ 15 + 6/ +20/ -8 

5 2 + (-2) 2 

_ 7 + 26/ 

” 29 

7 26 

=-I-/ ■ 

29 29 

Geometric Interpretation 

Each complex number z = a + bi corresponds to a point (a, b) in the plane R 2 , as 
in Figure 1. The horizontal axis is called the real axis because the points (< a , 0) on it 
correspond to the real numbers. The vertical axis is the imaginary axis because the 
points (0, b) on it correspond to the pure imaginary numbers of the form 0 + bi , or 
simply bi . The conjugate of z is the mirror image of z in the real axis. The absolute 
value of z is the distance from (a, b) to the origin. 


Imaginary 

axis 



Real axis 


FIGURE 1 The complex conjugate is a mirror image. 


Addition of complex numbers z = a + bi and w 
addition of (a, b) and (c, d) in M 2 , as in Figure 2. 


= c + di corresponds to vector 



FIGURE 2 Addition of complex numbers. 
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To give a graphical representation of complex multiplication, we use polar coordi¬ 
nates in R 2 . Given a nonzero complex number z = a + bi, let (p be the angle between 
the positive real axis and the point (a, b), as in Figure 3 where —jr<(p<Tt. The angle 
(p is called the argument of z ; we write cp = arg z . From trigonometry, 



z cos cp, 


b 


Z sin cp 


and so 


z = a + bi = |z|(cos<^ + i sin cp) 


Im z 



\z\ cos cp 


Re z 


FIGURE 3 Polar coordinates of z. 

If w is another nonzero complex number, say, 

w = \w \ (cos ft + i sin#) 

then, using standard trigonometric identities for the sine and cosine of the sum of two 
angles, one can verify that 

wz = M \z | [cos(^ + <p) + i sin(^ + (p)\ (4) 

See Figure 4. A similar formula may be written for quotients in polar form. The formulas 
for products and quotients can be stated in words as follows. 


Im z 



FIGURE 4 Multiplication with polar 
coordinates. 


Im z 



The product of two nonzero complex numbers is given in polar form by the 
product of their absolute values and the sum of their arguments. The quotient 
of two nonzero complex numbers is given by the quotient of their absolute values 
and the difference of their arguments. 


EXAMPLE 4 

a. If w has absolute value 1, then w = cos # + i sin #, where ft is the argument of w . 
Multiplication of any nonzero number z by w simply rotates z through the angle ft . 

b. The argument of i itself is 7r/2 radians, so multiplication of z by i rotates z through 
an angle of n j 2 radians. For example, 3 + i is rotated into (3 + /)/ = — 1 + 3/. ■ 
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Powers of a Complex Number 

Formula (4) applies when z = vo = r (cos (p + i sin cp) . In this case 

z = v (cos 2cp + i sin 2cp) 



= r (cos (p + i sin cp) • r 2 (cos 2 <p + i sin 2^) 
= r 3 (cos 3 cp + i sin 3^) 


In general, for any positive integer k, 

Z k = r k (coskcp + i sin kcp) 


This fact is known as De Moivre’s Theorem. 


Complex Numbers and ®L 2 

Although the elements of M 2 and C are in one-to-one correspondence, and the operations 
of addition are essentially the same, there is a logical distinction between R 2 and C. In 
R 2 we can only multiply a vector by a real scalar, whereas in C we can multiply any 
two complex numbers to obtain a third complex number. (The dot product in R 2 doesn’t 
count, because it produces a scalar, not an element of R 2 .) We use scalar notation for 
elements in C to emphasize this distinction. 


X 


2 



X 


1 



Re z 


The real plane R 2 . 


The complex plane C. 
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adjugate (or classical adjoint): The matrix adj A formed from 
a square matrix A by replacing the (i,j )-entry of A by 
the (/, j )-cofactor, for all i and j , and then transposing the 
resulting matrix. 

affine combination: A linear combination of vectors (points in 
R n ) in which the sum of the weights involved is 1. 


affine dependence relation: An equation of the form C\\\ + 
• • • + c p \p = 0 , where the weights c\, . .., c p are not all 
zero, and C\ + • • • + c p = 0. 

affine hull (or affine span) of a set S: The set of all affine 
combinations of points in S , denoted by aff S . 


affinely dependent set: A set {vi,..., \ p } in R" such that there 


+ 


are real numbers C\ ,. .., c p , not all zero, such that C\ + • • 
c p — 0 and C\\i + • • • + c p \ p = 0. 

affinely independent set: A set {vi, ..., v^} in R n that is not 
affinely dependent. 


affine set (or affine subset): A set S of points such that if p and 
q are in S , then (1 — ?)p + t q e S for each real number t . 


affine transformation: A mapping T : R" -> R m of the form 
T (x) = Ax + b, with A an m x n matrix and b in R m . 


algebraic multiplicity: The multiplicity of an eigenvalue as a 
root of the characteristic equation. 

angle (between nonzero vectors u and v in R 2 or R 3 ): The angle 
ft between the two directed line segments from the origin to 
the points u and v. Related to the scalar product by 


u*v 


u 


cos # 


associative law of multiplication: A ( BC ) = (AB) C , for all A , 
BX . 

attractor (of a dynamical system in R 2 ): The origin when all 
trajectories tend toward 0. 

augmented matrix: A matrix made up of a coefficient matrix 
for a linear system and one or more columns to the right. 
Each extra column contains the constants from the right side 
of a system with the given coefficient matrix. 

auxiliary equation: A polynomial equation in a variable r, 
created from the coefficients of a homogeneous difference 
equation. 

B 


backward phase (of row reduction): The last part of the al¬ 
gorithm that reduces a matrix in echelon form to a reduced 
echelon form. 

band matrix: A matrix whose nonzero entries lie within a band 
along the main diagonal. 

barycentric coordinates (of a point p with respect to an affinely 
independent set S = {vi,..., v^}): The (unique) set of 

weights C \,..., Ck such that p = c \Vi +- V c^k and c\ + 

• • • + c k = 1 . (Sometimes also called the affine coordinates 
of p with respect to S .) 

basic variable: A variable in a linear system that corresponds 
to a pivot column in the coefficient matrix. 

basis (for a nontrivial subspace H of a vector space V)\ An 
indexed set B = {vi,...,v p } in V such that: (i) B is a 
linearly independent set and (ii) the subspace spanned by B 
coincides with H , that is, H = Span {vi, ..., v^}. 

B-coordinates of x: See coordinates of x relative to the 
basis B. 

best approximation: The closest point in a given subspace to a 
given vector. 

bidiagonal matrix: A matrix whose nonzero entries lie on the 
main diagonal and on one diagonal adjacent to the main 
diagonal. 

block diagonal (matrix): A partitioned matrix A = [Ajj] such 
that each block Ajj is a zero matrix for i ^ j . 

block matrix: See partitioned matrix. 

block matrix multiplication: The row-column multiplication 
of partitioned matrices as if the block entries were scalars. 

block upper triangular (matrix): A partitioned matrix 
A = [Ajj] such that each block Ajj is a zero matrix for 
i > j . 

boundary point of a set S in R n : A point p such that every open 
ball in R n centered at p intersects both S and the complement 
of 5. 


bounded set in R": A set that is contained in an open ball 
B( 0, 5) for some 8 > 0. 


B-matrix (for T)\ A matrix [T] B for a linear transformation 
T : V -> V relative to a basis B for V , with the property that 
[r(x)] B = [T] b \Ab for all x in V. 


back-substitution (with matrix notation): The backward phase 
of row reduction of an augmented matrix that transforms an 
echelon matrix into a reduced echelon matrix; used to find 
the solution(s) of a system of linear equations. 



Cauchy-Schwarz inequality: |(u,v)| < \\u 


for all u, v. 


change of basis: See change-of-coordinates matrix. 
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change-of-coordinates matrix (from a basis B to a basis C): A 
matrix n Y2 that transforms ^-coordinate vectors into C- 


C<—B 


coordinate vectors: [x] c = c £_ B [x} B . If C is the standard 
basis for R", then C ^_ B is sometimes written as P B . 

characteristic equation (of A): det (A — XI) = 0. 

characteristic polynomial (of A): det (A — XI) or, in some 
texts, det(A/ — A). 

Cholesky factorization: A factorization A = R T R, where R is 
an invertible upper triangular matrix whose diagonal entries 
are all positive. 

closed ball (in R n ): A set {x : ||x — p|| < 5} in R n , where p is 
in R n and 8 > 0. 


closed set (in R n ): A set that contains all of its boundary points. 

codomain (of a transformation T : W 1 -> R m ): The set R m that 
contains the range of T. In general, if T maps a vector space 
V into a vector space W, then W is called the codomain 
of T. 


coefficient matrix: A matrix whose entries are the coefficients 
of a system of linear equations. 

cofactor: A number Qj = (—1) 7+7 det Ay, called the (i,j)~ 
cofactor of A , where Ajj is the submatrix formed by deleting 
the i th row and the j th column of A. 

cofactor expansion: A formula for det A using cofactors asso¬ 
ciated with one row or one column, such as for row 1: 


composition of linear transformations: A mapping produced 
by applying two or more linear transformations in succes¬ 
sion. If the transformations are matrix transformations, say 
left-multiplication by B followed by left-multiplication by 
A , then the composition is the mapping x i-> A(Bx). 

condition number (of A): The quotient 0 \/a n , where 0 \ is the 
largest singular value of A and <j n is the smallest singular 
value. The condition number is +oo when <j n is zero. 

conformable for block multiplication: Two partitioned matri¬ 
ces A and B such that the block product AB is defined: The 
column partition of A must match the row partition of B . 

consistent linear system: A linear system with at least one 
solution. 

constrained optimization: The problem of maximizing a quan¬ 
tity such as x^x or || Ax|| when x is subject to one or more 
constraints, such as x T x = 1 or x r v = 0. 

consumption matrix: A matrix in the Leontief input-output 
model whose columns are the unit consumption vectors for 
the various sectors of an economy. 

contraction: A mapping x i-> rx for some scalar r, with 
0 < r < 1. 

controllable (pair of matrices): A matrix pair (A, B) where A 
is n x n, B has n rows, and 

rank [B AB A 2 B ••• A n ~ l B]=n 


det A — d 11C 1 1 T • • • T a\ n C\ n 


column-row expansion: The expression of a product AB 
as a sum of outer products: coli(A) row^i?) + • • • + 
col /7 (A) row /7 ( B ), where n is the number of columns of A. 

column space (of an m x n matrix A): The set Col A of all 
linear combinations of the columns of A. If A = [ai • • • a n ], 
then Col A = Span {ai,..., a n }. Equivalently, 


Col A = {y : y = Ax for some x in R”} 


column sum: The sum of the entries in a column of a matrix. 

column vector: A matrix with only one column, or a single 
column of a matrix that has several columns. 

commuting matrices: Two matrices A and B such that 
AB = BA. 


compact set (in R"): A set in R' 7 that is both closed and 
bounded. 


companion matrix: A special form of matrix whose charac¬ 
teristic polynomial is (—1 ) n p(X) when p(X) is a specified 
polynomial whose leading term is X n . 

complex eigenvalue: A nonreal root of the characteristic equa¬ 
tion of an n x n matrix. 

complex eigenvector: A nonzero vector x in C n such that 
Ax = Ax, where A is an n x n matrix and A is a complex 
eigenvalue. 

component of y orthogonal to u (for u/0): The vector 

y*u 

y- u. 

U‘U 


Related to a state-space model of a control system and the 
difference equation x^+i = Axk + B\Xk (k = 0,1 ,...). 

convergent (sequence of vectors): A sequence {x^} such that 
the entries in x& can be made as close as desired to the entries 
in some fixed vector for all k sufficiently large. 

convex combination (of points Vi,...,v^ in IR' 7 ): A linear 
combination of vectors (points) in which the weights in the 
combination are nonnegative and the sum of the weights 
is 1. 

convex hull (of a set S): The set of all convex combinations of 
points in S , denoted by: conv S . 

convex set: A set S with the property that for each p and q in 
S , the line segment pq is contained in S . 

coordinate mapping (determined by an ordered basis B in a 
vector space V): A mapping that associates to each x in 
V its coordinate vector [x] B . 

coordinates of x relative to the basis B = {bi,..., b n }: The 

weights c\,... ,c n in the equation x = c\bi +-b c n b n . 

coordinate vector of x relative to B: The vector [x\ B whose 
entries are the coordinates of x relative to the basis B. 

covariance (of variables x 7 and Xj , for i ^ j): The entry in 
the covariance matrix S for a matrix of observations, where 
X[ and Xj vary over the i th and j th coordinates, respectively, 
of the observation vectors. 

covariance matrix (or sample covariance matrix): The p x p 

matrix S defined by S = (N — 1 )~ l BB T , where B is a 
p x N matrix of observations in mean-deviation form. 
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Cramer’s rule: A formula for each entry in the solution x of 
the equation Ax = b when A is an invertible matrix. 

cross-product term: A term cXjXj in a quadratic form, with 

i # j ■ 

cube: A three-dimensional solid object bounded by six square 
faces, with three faces meeting at each vertex. 


domain (of a transformation T): The set of all vectors x for 
which T(x) is defined. 

dot product: See inner product. 

dynamical system: See discrete linear dynamical system. 

E 


D 

decoupled system: A difference equation y^ +1 = Ay k , or a 
differential equation y '(t) = Ay(t), in which A is a diagonal 
matrix. The discrete evolution of each entry in y k (as a 
function of k), or the continuous evolution of each entry 
in the vector-valued function y(t), is unaffected by what 
happens to the other entries as k -> oo or t -> oo. 

design matrix: The matrix X in the linear model y = Xfi + e , 
where the columns of X are determined in some way by the 
observed values of some independent variables. 

determinant (of a square matrix A): The number det A defined 
inductively by a cofactor expansion along the first row of A . 
Also, (— l) r times the product of the diagonal entries in any 
echelon form U obtained from A by row replacements and 
r row interchanges (but no scaling operations). 

diagonal entries (in a matrix): Entries having equal row and 
column indices. 

diagonalizable (matrix): A matrix that can be written in fac¬ 
tored form as PDP~ { , where D is a diagonal matrix and P 
is an invertible matrix. 

diagonal matrix: A square matrix whose entries not on the 
main diagonal are all zero. 

difference equation (or linear recurrence relation) : An equa¬ 
tion of the form x^+i = Ax k (k = 0,1,2,...) whose solu¬ 
tion is a sequence of vectors, x 0 , x 1? .... 

dilation: A mapping x rx for some scalar r, with 1 < r. 

dimension: 

of a flat S: The dimension of the corresponding parallel 
subspace. 

of a set S : The dimension of the smallest flat containing S . 
of a subspace S: The number of vectors in a basis for *S, 
written as dim S . 

of a vector space V : The number of vectors in a basis for V, 
written as dim V. The dimension of the zero space is 0. 

discrete linear dynamical system: A difference equation of the 
form Xfc+i = Ax k that describes the changes in a system 
(usually a physical system) as time passes. The physical 
system is measured at discrete times, when k = 0,1,2 ,..., 
and the state of the system at time k is a vector x k whose 
entries provide certain facts of interest about the system. 

distance between u and v: The length of the vector u — v, 
denoted by dist (u, v). 

distance to a subspace: The distance from a given point (vec¬ 
tor) v to the nearest point in the subspace. 

distributive laws: (left) A(B + C) = AB + AC, and (right) 
(.B + C)A = BA + CA, for all A, B, C. 


echelon form (or row echelon form, of a matrix): An echelon 
matrix that is row equivalent to the given matrix. 

echelon matrix (or row echelon matrix) : A rectangular matrix 
that has three properties: (1) All nonzero rows are above 
any row of all zeros. (2) Each leading entry of a row is in 
a column to the right of the leading entry of the row above 
it. (3) All entries in a column below a leading entry are zero. 

eigenfunctions (of a differential equation x'(t) = Ax(t)): A 
function x(t) = \e Xt , where v is an eigenvector of A and A 
is the corresponding eigenvalue. 

eigenspace (of A corresponding to A): The set of all solutions 
of Ax = Ax, where A is an eigenvalue of A. Consists of the 
zero vector and all eigenvectors corresponding to A. 

eigenvalue (of A): A scalar A such that the equation Ax = Ax 
has a solution for some nonzero vector x. 

eigenvector (of A): A nonzero vector x such that Ax = Ax for 
some scalar A. 

eigenvector basis: A basis consisting entirely of eigenvectors 
of a given matrix. 

eigenvector decomposition (of x): An equation, x = c\\\ + 
• • • + c n \ n , expressing x as a linear combination of eigen¬ 
vectors of a matrix. 


elementary matrix: An invertible matrix that results by per¬ 
forming one elementary row operation on an identity matrix. 

elementary row operations: (1) (Replacement) Replace one 
row by the sum of itself and a multiple of another row. 
(2) Interchange two rows. (3) (Scaling) Multiply all entries 
in a row by a nonzero constant. 


equal vectors: Vectors in R n whose corresponding entries are 
the same. 


equilibrium prices: A set of prices for the total output of the 
various sectors in an economy, such that the income of each 
sector exactly balances its expenses. 

equilibrium vector: See steady-state vector. 

equivalent (linear) systems: Linear systems with the same 
solution set. 


exchange model: See Leontief exchange model. 

existence question: Asks, “Does a solution to the system ex¬ 
ist?” That is, “Is the system consistent?” Also, “Does a 
solution of Ax = b exist for all possible b?” 

expansion by cofactors: See cofactor expansion. 

explicit description (of a subspace W of R"): A parametric 
representation of W as the set of all linear combinations of 
a set of specified vectors. 

extreme point (of a convex set 5): A point p in S such that p is 
not in the interior of any line segment that lies in S . (That is, 
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if x, y are in S and p is on the line segment xy, then p = x 
or p = y.) 


F 

factorization (of A): An equation that expresses A as a product 
of two or more matrices. 

final demand vector (or bill of final demands): The vector d 
in the Leontief input-output model that lists the dollar values 
of the goods and services demanded from the various sectors 
by the nonproductive part of the economy. The vector d 
can represent consumer demand, government consumption, 
surplus production, exports, or other external demand. 

finite-dimensional (vector space): A vector space that is 
spanned by a finite set of vectors. 

flat (in IR"): A translate of a subspace of IR". 

flexibility matrix: A matrix whose yth column gives the de¬ 
flections of an elastic beam at specified points when a unit 
force is applied at the j th point on the beam. 

floating point arithmetic: Arithmetic with numbers repre¬ 
sented as decimals ± .d\-- d p x 10 r , where r is an integer 
and the number p of digits to the right of the decimal point 
is usually between 8 and 16. 

flop: One arithmetic operation (+,—,*,/) on two real floating 
point numbers. 

forward phase (of row reduction): The first part of the algo¬ 
rithm that reduces a matrix to echelon form. 

Fourier approximation (of order n): The closest point in the 
subspace of nth-order trigonometric polynomials to a given 
function in C [0, 2tt] . 

Fourier coefficients : The weights used to make a trigonometric 
polynomial as a Fourier approximation to a function. 

Fourier series: An infinite series that converges to a function 
in the inner product space C[0 , 2n], with the inner product 
given by a definite integral. 

free variable: Any variable in a linear system that is not a basic 
variable. 

full rank (matrix): An m x n matrix whose rank is the smaller 
of m and n . 

fundamental set of solutions: A basis for the set of all solutions 
of a homogeneous linear difference or differential equation. 

fundamental subspaces (determined by A): The null space and 
column space of A , and the null space and column space of 
A t , with Col A T commonly called the row space of A. 


G 


Gaussian elimination: See row reduction algorithm. 

general least-squares problem: Given an m x n matrix 
A and a vector b in R m , find x in IR" such that 
||b — Ax|| < ||b — Ax|| for all x in IR". 

general solution (of a linear system): A parametric description 
of a solution set that expresses the basic variables in terms of 


the free variables (the parameters), if any. After Section 1.5, 
the parametric description is written in vector form. 

Givens rotation: A linear transformation from IR" to IR" used in 
computer programs to create zero entries in a vector (usually 
a column of a matrix). 

Gram matrix (of A): The matrix A T A. 

Gram-Schmidt process: An algorithm for producing an or¬ 
thogonal or orthonormal basis for a subspace that is spanned 
by a given set of vectors. 



homogeneous coordinates: In IR 3 , the representation of 
(x, y, z) as ( X , Y, Z, H) for any H ^ 0, where x = X/H , 
y = Y/H, and z = Z/H. In IR 2 , H is usually taken as 1, 
and the homogeneous coordinates of (x,y) are written as 

(x, y, 1). 


homogeneous equation: An equation of the form Ax = 0, pos¬ 
sibly written as a vector equation or as a system of linear 
equations. 


homogeneous form of (a vector) v in IR" : The point v 


in IR" +1 . 



Householder reflection: A transformation xiQx, where 
Q = I — 2uu r and u is a unit vector (u r u = 1 ). 

hyperplane (in IR"): A flat in IR" of dimension n — 1. Also: a 
translate of a subspace of dimension n — 1. 


i 

identity matrix (denoted by I or I n ): A square matrix with ones 
on the diagonal and zeros elsewhere. 

ill-conditioned matrix: A square matrix with a large (or pos¬ 
sibly infinite) condition number; a matrix that is singular or 
can become singular if some of its entries are changed ever 
so slightly. 

image (of a vector x under a transformation T)\ The vector T (x) 
assigned to x by T . 

implicit description (of a subspace IT of IR"): A set of one 
or more homogeneous equations that characterize the points 
of IT. 

Im x: The vector in IR" formed from the imaginary parts of the 
entries of a vector x in C". 

inconsistent linear system: A linear system with no solution. 

indefinite matrix: A symmetric matrix A such that x^x as¬ 
sumes both positive and negative values. 

indefinite quadratic form: A quadratic form Q such that Q (x) 
assumes both positive and negative values. 

infinite-dimensional (vector space): A nonzero vector space V 
that has no finite basis. 

inner product: The scalar u r v, usually written as u* v, where u 
and v are vectors in IR" viewed as /2 x 1 matrices. Also called 
the dot product of u and v. In general, a function on a vector 
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space that assigns to each pair of vectors u and v a number 
(u, v), subject to certain axioms. See Section 6.7. 

inner product space: A vector space on which is defined an 
inner product. 

input-output matrix: See consumption matrix. 

input-output model: See Leontief input-output model. 

interior point (of a set S in R 77 ): A point p in S such that for 
some 8 > 0, the open ball B(p, 5) centered at p is contained 
in 5. 

intermediate demands: Demands for goods or services that 
will be consumed in the process of producing other goods 
and services for consumers. If x is the production level and 
C is the consumption matrix, then Cx lists the intermediate 
demands. 

interpolating polynomial: A polynomial whose graph passes 
through every point in a set of data points in R 2 . 

invariant subspace (for A): A subspace H such that Ax is in 
H whenever x is in H . 

inverse (of an n x n matrix A): An n x n matrix A~ l such that 
AA~ l = A- 1 A = I n . 

inverse power method: An algorithm for estimating an eigen¬ 
value A of a square matrix, when a good initial estimate of A 
is available. 

invertible linear transformation: A linear transformation 
T : IR” -> R 77 such that there exists a function S : R 77 -> R 77 
satisfying both T(S (x)) = x and S(T(x)) = x for all x 
in R". 

invertible matrix: A square matrix that possesses an inverse. 

isomorphic vector spaces: Two vector spaces V and W for 
which there is a one-to-one linear transformation T that maps 
V onto W. 

isomorphism: A one-to-one linear mapping from one vector 
space onto another. 


K 

kernel (of a linear transformation T : V -> W)\ The set of x in 
V such that T (x) = 0. 

Kirchhoff’s laws: (1) (voltage law) The algebraic sum of the 
RI voltage drops in one direction around a loop equals the 
algebraic sum of the voltage sources in the same direction 
around the loop. (2) (current law) The current in a branch is 
the algebraic sum of the loop currents flowing through that 
branch. 


L 

ladder network: An electrical network assembled by connect¬ 
ing in series two or more electrical circuits. 

leading entry: The leftmost nonzero entry in a row of a matrix. 

least-squares error: The distance ||b —Ax|| from b to Ax, 
when x is a least-squares solution of Ax = b. 

least-squares line: The line y = /3 0 + fax that minimizes the 
least-squares error in the equation y = Xfi + e. 


least-squares solution (of Ax = b): A vector x such that 
||b — Ax|| < ||b — Ax|| for all x in R 77 . 

left inverse (of A): Any rectangular matrix C such that 
CA = I. 


left-multiplication (by A): Multiplication of a vector or matrix 
on the left by A. 


left singular vectors (of A): The columns of U in the singular 
value decomposition A = U'EV 7 . 

length (or norm, of v): The scalar ||v|| = V v#v — v (v, v). 


Leontief exchange (or closed) model: A model of an economy 
where inputs and outputs are fixed, and where a set of prices 
for the outputs of the sectors is sought such that the income 
of each sector equals its expenditures. This “equilibrium” 
condition is expressed as a system of linear equations, with 
the prices as the unknowns. 


Leontief input-output model (or Leontief production equa¬ 
tion): The equation x = Cx + d, where x is production, d 
is final demand, and C is the consumption (or input-output) 
matrix. The yth column of C lists the inputs that sector j 
consumes per unit of output. 


level set (or gradient) of a linear functional / on R 77 : A set 
[f:d] = {x e R" : /(x) = d} 


linear combination: A sum of scalar multiples of vectors. The 
scalars are called the weights. 


linear dependence relation: A homogeneous vector equation 
where the weights are all specified and at least one weight is 
nonzero. 


linear equation (in the variables x u ... ,x n ): An equation that 
can be written in the form a\X\ + a 2 x 2 + • • • + a n x n = b, 
where b and the coefficients a\,... ,a n are real or complex 
numbers. 


linear filter: A linear difference equation used to transform 
discrete-time signals. 

linear functional (on R 77 ): A linear transformation / from R 77 
into R. 

linearly dependent (vectors): An indexed set {vi ,..., v p } with 
the property that there exist weights c\,... ,c p , not all zero, 
such that ci Vi T • • • + c p x p = 0. That is, the vector equation 
CiVi + c 2 v 2 + • • • + c p \ p = 0 has a nontrivial solution. 

linearly independent (vectors): An indexed set {vi,...,v^} 
with the property that the vector equation C\\\ + 
c 2 \ 2 + • • • + c p x p = 0 has only the trivial solution, 
c i = ••• = c p = 0. 

linear model (in statistics): Any equation of the form 
y = X ft + e , where X and y are known and is to be chosen 
to minimize the length of the residual vector, e. 

linear system: A collection of one or more linear equations 
involving the same variables, say, X \ * . . . ^ Xf'i • 

linear transformation T (from a vector space V into a vec¬ 
tor space W): A rule T that assigns to each vector 
x in V a unique vector T(x) in W, such that (i) 
T(u + v) = T(u) + T(v) for all u, v in V, and (ii) 
T(c u) = cT( u) for all u in V and all scalars c. Notation: 
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T: V W; also, x f-> Ax when T : R 
standard matrix for T . 


n 


R m and A is the 


line through p parallel to v: The set {p + t\ : t in E}. 

loop current: The amount of electric current flowing through a 
loop that makes the algebraic sum of the RI voltage drops 
around the loop equal to the algebraic sum of the voltage 
sources in the loop. 

lower triangular matrix: A matrix with zeros above the main 
diagonal. 

lower triangular part (of A): A lower triangular matrix whose 
entries on the main diagonal and below agree with those 
in A. 


LU factorization: The representation of a matrix A in the form 
A — LU where L is a square lower triangular matrix with 
ones on the diagonal (a unit lower triangular matrix) and U 
is an echelon form of A . 



magnitude (of a vector): See norm. 

main diagonal (of a matrix): The entries with equal row and 
column indices. 

mapping: See transformation. 

Markov chain: A sequence of probability vectors x 0 , Xi, 
x 2 ,, together with a stochastic matrix P such that 
x^_|_ ! = Px k for k = 0,1,2,_ 

matrix: A rectangular array of numbers. 

matrix equation: An equation that involves at least one matrix; 
for instance, Ax = b. 

matrix for T relative to bases B and C : A matrix M for 
a linear transformation T : V -> W with the property that 
[T (x)]c = M [x]b for all x in V, where B is a basis for V and 
C is a basis for W. When W — V and C = B, the matrix M 
is called the B-matrix for T and is denoted by [T] B . 

matrix of observations: A p x N matrix whose columns are 
observation vectors, each column listing p measurements 
made on an individual or object in a specified population 
or set. 

matrix transformation: A mapping x i-> Ax, where A is an 
m x n matrix and x represents any vector in E”. 

maximal linearly independent set (in V): A linearly indepen¬ 
dent set B in V such that if a vector v in V but not in B is 
added to B, then the new set is linearly dependent. 

mean-deviation form (of a matrix of observations): A matrix 
whose row vectors are in mean-deviation form. For each 
row, the entries sum to zero. 

mean-deviation form (of a vector): A vector whose entries sum 
to zero. 

mean square error: The error of an approximation in an inner 
product space, where the inner product is defined by a defi¬ 
nite integral. 


migration matrix: A matrix that gives the percentage move¬ 
ment between different locations, from one period to the 
next. 

minimal spanning set (for a subspace H): A set B that spans 
H and has the property that if one of the elements of B is 
removed from B, then the new set does not span H. 

m x n matrix: A matrix with m rows and n columns. 

Moore-Penrose inverse: See pseudoinverse. 

multiple regression: A linear model involving several indepen¬ 
dent variables and one dependent variable. 

N 


nearly singular matrix: An ill-conditioned matrix. 

negative definite matrix: A symmetric matrix A such that 
x^x < 0 for all x ^ 0. 

negative definite quadratic form: A quadratic form Q such 
that Q(x) <0 for all x/0. 

negative semidefinite matrix: A symmetric matrix A such that 
x^x < 0 for all x. 


negative semidefinite quadratic form: A quadratic form Q 
such that Q(x) <0 for all x. 

nonhomogeneous equation: An equation of the form Ax = b 
with b ^ 0 , possibly written as a vector equation or as a 
system of linear equations. 

nonsingular (matrix): An invertible matrix. 

nontrivial solution: A nonzero solution of a homogeneous 
equation or system of homogeneous equations. 

nonzero (matrix or vector): A matrix (with possibly only one 
row or column) that contains at least one nonzero entry. 

norm (or length, of v): The scalar ||v|| = V v#v — y (v, v). 

normal equations: The system of equations represented by 
A T Ax = A T b, whose solution yields all least-squares so¬ 
lutions of Ax = b. In statistics, a common notation is 
X T Xp = X T y. 


normalizing (a nonzero vector v): The process of creating a unit 
vector u that is a positive multiple of v. 


normal vector (to a subspace V of E n ): 
that n-x = 0 for all x in V. 


A vector n in R' 1 such 


null space (of mm x n matrix A): The set Nul A of all solutions 
to the homogeneous equation Ax = 0. Nul A = {x : x is in 
E" and Ax = 0}. 



observation vector: The vector y in the linear model 
y = Xp + €, where the entries in y are the observed values 
of a dependent variable. 


one-to-one (mapping): A mapping T: E" —> R m such that 
each b in R m is the image of at most one x in R n . 


onto (mapping): A mapping T : R /J -> R m such that each b in 
R' n is the image of at least one x in R". 
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open ball B(p, 5) inE 77 : The set {x: ||x —p|| < 8 } in E" , where 
8 > 0 . 


open set S in E 77 : A set that contains none of its boundary 
points. (Equivalently, S is open if every point of S is an 
interior point.) 

origin: The zero vector. 

orthogonal basis: A basis that is also an orthogonal set. 

orthogonal complement (of W)\ The set W 1 - of all vectors 
orthogonal to W . 

orthogonal decomposition: The representation of a vector y 
as the sum of two vectors, one in a specified subspace 
W and the other in In general, a decomposition 

y = ciUi +-K c p Up, where {ui,..., u p } is an orthogonal 

basis for a subspace that contains y. 

orthogonally diagonalizable (matrix): A matrix A that admits 
a factorization, A = PDP ~ l , with P an orthogonal matrix 
( P~ l = P T ) and D diagonal. 

orthogonal matrix: A square invertible matrix U such that 
U~ [ = U T . 


orthogonal projection of y onto u (or onto the line through u and 

y • u 

the origin, for u/0): The vector y defined by y = - u. 

u*u 

orthogonal projection of y onto W : The unique vector y in W 
such that y — y is orthogonal to W. Notation: y = proj^ y. 

orthogonal set: A set S of vectors such that u*v = 0 for each 
distinct pair u, v in S . 

orthogonal to W: Orthogonal to every vector in W . 

orthonormal basis: A basis that is an orthogonal set of unit 
vectors. 


orthonormal set: An orthogonal set of unit vectors. 

outer product: A matrix product uv r where u and v are vectors 
in E 77 viewed as n x 1 matrices. (The transpose symbol is on 
the “outside” of the symbols u and v.) 

over determined system: A system of equations with more 
equations than unknowns. 


p 

parallel flats: Two or more flats such that each flat is a translate 
of the other flats. 

parallelogram rule for addition: A geometric interpretation of 
the sum of two vectors u, v as the diagonal of the parallelo¬ 
gram determined by u, v, and 0. 

parameter vector: The unknown vector in the linear model 

y = xp + €. 

parametric equation of a line: An equation of the form 
x = p + t\ (t in E). 

parametric equation of a plane: An equation of the form 
x = p + su + t\ (s, t in E), with u and v linearly 
independent. 

partitioned matrix (or block matrix): A matrix whose entries 
are themselves matrices of appropriate sizes. 


permuted lower triangular matrix: A matrix such that a per¬ 
mutation of its rows will form a lower triangular matrix. 

permuted LU factorization: The representation of a matrix A 
in the form A — LU where L is a square matrix such that 
a permutation of its rows will form a unit lower triangular 
matrix, and U is an echelon form of A. 


pivot: A nonzero number that either is used in a pivot position 
to create zeros through row operations or is changed into a 
leading 1, which in turn is used to create zeros. 

pivot column: A column that contains a pivot position. 

pivot position: A position in a matrix A that corresponds to a 
leading entry in an echelon form of A . 

plane through u, v, and the origin: A set whose parametric 
equation is x = su + tv (s, t in E), with u and v linearly 
independent. 


polar decomposition (of A): A factorization A = PQ, where 
P is an n x n positive semidefinite matrix with the same rank 
as A, and Q is an n x n orthogonal matrix. 


polygon: A poly tope in E 2 . 


polyhedron: 


A polytope in E 



poly tope: The convex hull of a finite set of points in E 77 (a 
special type of compact convex set). 


combination c\V\ + • • • + c m \ m , where 




A linear 


positive definite matrix: A symmetric matrix A such that 
x^x > 0 for all x / 0. 

positive definite quadratic form: A quadratic form Q such 
that Q(x) >0 for all x/0. 

positive hull (of a set S): The set of all positive combinations 
of points in S , denoted by pos S . 

positive semidefinite matrix: A symmetric matrix A such that 
x T Ax > 0 for all x. 


positive semidefinite quadratic form: A quadratic form Q 
such that Q (x) >0 for all x . 

power method: An algorithm for estimating a strictly dominant 
eigenvalue of a square matrix. 

principal axes (of a quadratic form x^4x): The orthonormal 
columns of an orthogonal matrix P such that P~ l AP is 
diagonal. (These columns are unit eigenvectors of A.) Usu¬ 
ally the columns of P are ordered in such a way that the 
corresponding eigenvalues of A are arranged in decreasing 
order of magnitude. 


principal components (of the data in a matrix B of 
observations): The unit eigenvectors of a sample co- 
variance matrix S for B, with the eigenvectors arranged 
so that the corresponding eigenvalues of S decrease in 
magnitude. If B is in mean-deviation form, then the principal 
components are the right singular vectors in a singular value 
decomposition of B T . 


probability vector: A vector in E 77 whose entries are nonnega¬ 
tive and sum to one. 
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product Ax: The linear combination of the columns of A using 
the corresponding entries in x as weights. 

production vector: The vector in the Leontief input-output 
model that lists the amounts that are to be produced by the 
various sectors of an economy. 


profile (of a set S in R n ): 


The set of extreme points of S . 


projection matrix (or orthogonal projection matrix): A sym¬ 
metric matrix B such that B 2 = B. A simple example is 
B = vv r , where v is a unit vector. 


proper subset of a set S: A subset of S that does not equal S 
itself. 


proper subspace: Any subspace of a vector space V other than 
V itself. 

pseudoinverse (of A): The matrix VD~ 1 U T , when UDV T is a 
reduced singular value decomposition of A . 


Q 

QR factorization: A factorization of an m x n matrix A with 
linearly independent columns, A = QR , where Q is an 
m x n matrix whose columns form an orthonormal basis for 
Col A , and R is an n x n upper triangular invertible matrix 
with positive entries on its diagonal. 

quadratic Bezier curve: A curve whose description may be 
written in the form g (t) = (1 — t)f 0 (t) + tfi(t) for 0 < t < 

1, where f 0 (0 = (1 - OPo + *Pi and fi(0 = (1 - OPi + 
t p 2 . The points p 0 , p l9 p 2 are called the control points for 
the curve. 

quadratic form: A function Q defined for x in R" by Q (x) = 
x^Ax, where A is an n x n symmetric matrix (called the 

matrix of the quadratic form). 

R 

range (of a linear transformation T): The set of all vectors of 
the form T (x) for some x in the domain of T . 

rank (of a matrix A): The dimension of the column space of A, 
denoted by rank A. 

Rayleigh quotient: R(x) = (x 7 Ax)/(x r x). An estimate of an 
eigenvalue of A (usually a symmetric matrix). 

recurrence relation: See difference equation. 

reduced echelon form (or reduced row echelon form): A 

reduced echelon matrix that is row equivalent to a given 
matrix. 

reduced echelon matrix: A rectangular matrix in echelon form 
that has these additional properties: The leading entry in each 
nonzero row is 1, and each leading 1 is the only nonzero entry 
in its column. 

reduced singular value decomposition: A factorization 
A = UDV t , for an m x n matrix A of rank r, where U is 
m x r with orthonormal columns, D is an r x r diagonal 
matrix with the r nonzero singular values of A on its 
diagonal, and F is n x r with orthonormal columns. 


regression coefficients: The coefficients /3 0 and /3i in the least- 
squares line y — /3 0 + f$\x. 

regular solid: One of the five possible regular polyhedrons 
in R 3 : the tetrahedron (4 equal triangular faces), the cube 
(6 square faces), the octahedron (8 equal triangular faces), 
the dodecahedron (12 equal pentagonal faces), and the icosa¬ 
hedron (20 equal triangular faces). 

regular stochastic matrix: A stochastic matrix P such that 
some matrix power P k contains only strictly positive entries. 

relative change or relative error (in b): The quantity 
|| Ah ||/1| b || when b is changed to b + Ab. 

repellor (of a dynamical system in M 2 ): The origin when all 
trajectories except the constant zero sequence or function 
tend away from 0. 

residual vector: The quantity e that appears in the general 
linear model: y = X + e; that is, e = y — X/$, the differ¬ 
ence between the observed values and the predicted values 

(of y). 

Re x: The vector in W 1 formed from the real parts of the entries 
of a vector x in C n . 

right inverse (of A): Any rectangular matrix C such that 

AC = /. 

right-multiplication (by A): Multiplication of a matrix on the 
right by A. 

right singular vectors (of A): The columns of V in the singular 
value decomposition A = U'EV 7 . 

roundoff error: Error in floating point arithmetic caused when 
the result of a calculation is rounded (or truncated) to the 
number of floating point digits stored. Also, the error that 
results when the decimal representation of a number such as 
1/3 is approximated by a floating point number with a finite 
number of digits. 

row-column rule: The rule for computing a product AB in 
which the (/, j )-entry of AB is the sum of the products of 
corresponding entries from row i of A and column j of B. 

row equivalent (matrices): Two matrices for which there exists 
a (finite) sequence of row operations that transforms one 
matrix into the other. 

row reduction algorithm: A systematic method using elemen¬ 
tary row operations that reduces a matrix to echelon form or 
reduced echelon form. 

row replacement: An elementary row operation that replaces 
one row of a matrix by the sum of the row and a multiple of 
another row. 

row space (of a matrix A): The set Row A of all linear combina¬ 
tions of the vectors formed from the rows of A; also denoted 
by Col A T . 

row sum: The sum of the entries in a row of a matrix. 

row vector: A matrix with only one row, or a single row of a 
matrix that has several rows. 

row-vector rule for computing Ax: The rule for computing a 
product Ax in which the ith entry of Ax is the sum of the 
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products of corresponding entries from row i of A and from 
the vector x. 



saddle point (of a dynamical system in R 2 ): The origin when 
some trajectories are attracted to 0 and other trajectories are 
repelled from 0 . 

same direction (as a vector v): A vector that is a positive 
multiple of v. 

sample mean: The average M of a set of vectors, Xi,..., X N , 
given by M = (l/N)(X { + • • • + X N ). 

scalar: A (real) number used to multiply either a vector or a 
matrix. 

scalar multiple of u by c: The vector c u obtained by multiply¬ 
ing each entry in u by c. 

scale (a vector): Multiply a vector (or a row or column of a 
matrix) by a nonzero scalar. 

Schur complement: A certain matrix formed from the blocks 
of a 2 x 2 partitioned matrix A = [Ajj]. If A n is invert¬ 
ible, its Schur complement is given by A 22 — A 2 \A]~ l [ A u . 
If A 22 is invertible, its Schur complement is given by 
^11 — A 22 A 2 i . 

Schur factorization (of A , for real scalars): A factorization 
A = URU T of an n x n matrix A having n real eigenvalues, 
where U is an n x n orthogonal matrix and R is an upper 
triangular matrix. 

set spanned by {vi, ..., v^}: The set Span{vi,..., v^}. 

signal (or discrete-time signal): A doubly infinite sequence of 
numbers, a function defined on the integers; belongs to 
the vector space §. 

similar (matrices): Matrices A and B such that P~ l AP = B , 
or equivalently, A = PBP ~ l , for some invertible matrix P. 

similarity transformation: A transformation that changes A 
into P~ l AP . 


simplex: The convex hull of an affinely independent finite set 
of vectors in R”. 


singular (matrix): A square matrix that has no inverse. 

singular value decomposition (of an m x n matrix A): A = 
U'EV 7 , where U is an m x m orthogonal matrix, V is an 
n x n orthogonal matrix, and £ is an m x n matrix with non¬ 
negative entries on the main diagonal (arranged in decreas¬ 
ing order of magnitude) and zeros elsewhere. If rank A = r , 
then £ has exactly r positive entries (the nonzero singular 
values of A) on the diagonal. 

singular values (of A): The (positive) square roots of the eigen¬ 
values of A T A, arranged in decreasing order of magnitude. 

size (of a matrix): Two numbers, written in the form m x n, 
that specify the number of rows ( m ) and columns ( n ) in the 
matrix. 


solution (of a linear system involving variables x \,..., x n ): A 
list (ji, s 2 ,..., s n ) of numbers that makes each equation in 


the system a true statement when the values S\,... ,s n are 
substituted for x\ ,..., x n , respectively. 

solution set: The set of all possible solutions of a linear sys¬ 
tem. The solution set is empty when the linear system is 
inconsistent. 

Span {vi,..., v^}: The set of all linear combinations of 
Vi ,... ,\ p . Also, the sub space spanned (or generated ) by 

M, • • • ? v p. 

spanning set (for a subspace H)\ Any set {vi,..., \ p ) in H 
such that H = Span {vi,..., v^}. 

spectral decomposition (of A): A representation 

A = AjUiuf H-b 

where {ui,..., u n } is an orthonormal basis of eigenvectors 
of A, and Ai,..., X n are the corresponding eigenvalues of A. 

spiral point (of a dynamical system in R 2 ): The origin when 
the trajectories spiral about 0. 

stage-matrix model: A difference equation x^+i = Ax^ where 
Xk lists the number of females in a population at time A, 
with the females classified by various stages of development 
(such as juvenile, subadult, and adult). 

standard basis: The basis S = {ej,..., e n } for R n consisting 
of the columns of the n x n identity matrix, or the basis 
{ 1 , t, ... ,t n } for P 7 , . 

standard matrix (for a linear transformation T ): The matrix A 
such that T (x) = Ax for all x in the domain of T. 

standard position: The position of the graph of an equation 
x^x = c , when A is a diagonal matrix. 

state vector: A probability vector. In general, a vector that de¬ 
scribes the “state” of a physical system, often in connection 
with a difference equation x^+i = Ax^. 

steady-state vector (for a stochastic matrix P)\ A probability 
vector q such that Pq = q. 

stiffness matrix: The inverse of a flexibility matrix. The yth 
column of a stiffness matrix gives the loads that must be 
applied at specified points on an elastic beam in order to 
produce a unit deflection at the j th point on the beam. 

stochastic matrix: A square matrix whose columns are proba¬ 
bility vectors. 

strictly dominant eigenvalue: An eigenvalue X\ of a matrix A 
with the property that |A ll > \X k \ for all other eigenvalues 
Xjc of A. 

submatrix (of A): Any matrix obtained by deleting some rows 
and/or columns of A; also, A itself. 

subspace: A subset H of some vector space V such that H has 
these properties: (1) the zero vector of F is in H\ (2) H 
is closed under vector addition; and (3) H is closed under 
multiplication by scalars. 

supporting hyperplane (to a compact convex set S in R"): A 
hyperplane H = [/ :d] such that H D S / 0 and either 
/(*) < d for all x in S or /(x) > d for all x in S. 

symmetric matrix: A matrix A such that A 7 = A. 



A16 Glossary 


system of linear equations (or a linear system): A collection 
of one or more linear equations involving the same set of 
variables, say, x\ ,..., x n . 



tetrahedron: A three-dimensional solid object bounded by four 
equal triangular faces, with three faces meeting at each 
vertex. 

total variance: The trace of the covariance matrix S of a matrix 
of observations. 


trace (of a square matrix A): The sum of the diagonal entries in 
A , denoted by tr A. 

trajectory: The graph of a solution {x 0 , Xi, x 2 , ...} of a dynam¬ 
ical system x^+i = Ax^, often connected by a thin curve 
to make the trajectory easier to see. Also, the graph of x(t) 
for t > 0, when x(t) is a solution of a differential equation 
x' (t) = Ax(t). 


transfer matrix: A matrix A associated with an electrical cir¬ 
cuit having input and output terminals, such that the output 
vector is A times the input vector. 

transformation (or function, or mapping) T from R w to 

R m : A rule that assigns to each vector x in R' 7 a 
unique vector T(x) in R m . Notation: T: R n -> R m . Also, 
T : V -> W denotes a rule that assigns to each x in V a 
unique vector T (x) in W . 

translation (by a vector p): The operation of adding p to a 
vector or to each vector in a given set. 

transpose (of A): An n x m matrix A T whose columns are the 
corresponding rows of the m x n matrix A . 


trend analysis: The use of orthogonal polynomials to fit data, 
with the inner product given by evaluation at a finite set of 
points. 


triangle inequality: 


u + v < u + v for all u, v. 


triangular matrix: A matrix A with either zeros above or zeros 
below the diagonal entries. 

trigonometric polynomial: A linear combination of the con¬ 
stant function 1 and sine and cosine functions such as cos n t 
and sin nt. 


trivial solution: The solution x = 0 of a homogeneous equation 
Ax = 0. 


u 

uncorrelated variables: Any two variables x t and xj (with 
i ^ j ) that range over the i th and yth coordinates of the 
observation vectors in an observation matrix, such that the 
covariance s t j is zero. 

underdetermined system: A system of equations with fewer 
equations than unknowns. 

uniqueness question: Asks, “If a solution of a system exists, is 
it unique—that is, is it the only one?” 


unit consumption vector: A column vector in the Leontief 
input-output model that lists the inputs a sector needs for 
each unit of its output; a column of the consumption matrix. 

unit lower triangular matrix: A square lower triangular ma¬ 
trix with ones on the main diagonal. 

unit vector: A vector v such that ||v|| = 1. 

upper triangular matrix: A matrix U (not necessarily square) 
with zeros below the diagonal entries u n, u 22 ,.... 



Vandermonde matrix: 

An n 

x n matrix 

V or it 

when V has the form 





1 

X\ 

x i ■■■ 

v n — l 

X\ 

V = 

1 

• 

• 

x 2 

• 

• 

xf 

• 

• 

v n —1 
a 2 

• 

• 


• 

1 

• 

X n 

• 

V 2 

X • • • 

n 

• 

y-n — 1 

x n 


variance (of a variable Xj ): The diagonal entry Sjj in the covari¬ 
ance matrix S for a matrix of observations, where Xj varies 
over the j th coordinates of the observation vectors. 

vector: A list of numbers; a matrix with only one column. In 
general, any element of a vector space. 

vector addition: Adding vectors by adding corresponding 
entries. 

vector equation: An equation involving a linear combination 
of vectors with undetermined weights. 

vector space: A set of objects, called vectors, on which two 
operations are defined, called addition and multiplication by 
scalars. Ten axioms must be satisfied. See the first definition 
in Section 4.1. 

vector subtraction: Computing u + (—l)v and writing the re¬ 
sult as u — v. 



weighted least squares: Least-squares problems with a 
weighted inner product such as 


(x,y) = w\xiy t H-b wlx„y„ 


weights: The scalars used in a linear combination. 

z 

zero subspace: The subspace {0} consisting of only the zero 
vector. 

zero vector: The unique vector, denoted by 0, such that 
u + 0 = u for all u. In R 77 ,0 is the vector whose entries are 
all zeros. 







Answers to Odd-Numbered 
Exercises 


Chapter 1 


Section 1.1, page 10 

1. The solution is (xi, x 2 ) = (— 8 , 3), or simply (— 8 , 3). 
3. (4/7, 9/7) 

5. Replace row 2 by its sum with 3 times row 3, and then 
replace row 1 by its sum with —5 times row 3. 


7. The solution set is empty. 

9. (4, 8 , 5,2) 11. Inconsistent 

13. (5, 3,-1) 15. Consistent 

17. The three lines have one point in common. 

19. h + 2 21. All h 

23. Mark a statement True only if the statement is always true. 
Giving you the answers here would defeat the purpose of 
the true-false questions, which is to help you learn to read 
the text carefully. The Study Guide will tell you where to 
look for the answers, but you should not consult it until you 
have made an honest attempt to find the answers yourself. 


25. k T 2g T h — 0 


27. The row reduction of 


1 

c 


3 

d 


f 

g 


to 


shows that d — 3c must be 


1 3 / 

0 d — 3c g — cf 
nonzero, since / and g are arbitrary. Otherwise, for some 
choices of / and g the second row could correspond to an 
equation of the form 0 = b, where b is nonzero. Thus 
d 3c. 


29. Swap row 1 and row 2; swap row 1 and row 2. 

31. Replace row 3 by row 3 + (—4) row 1; replace row 3 by 
row 3 + (4) row 1. 

33. 473 - T 2 - T a = 30 

-Ti + 473 - T 3 =60 

— 73 + 47/ - 73 = 70 
-73 - 73 + 473 = 40 

Section 1.2, page 21 


1. Reduced echelon form: a and b. Echelon form: d. Not 
echelon: c. 


3 


5 


7 


1 

0 

0 


0 -1 -2 


1 

0 


2 

0 


3 

0 


. Pivot cols 1 and 2: 


1 


2 

3 

4 

■ 



4 


5 

6 

7 

• 



_ 6 


7 

8 

9 _ 



■ 

* 


■ 

* 


"0 

■ 

_0 

■ 

9 

_0 

0 _ 

9 

_0 

0 _ 



9. 


x\ = 4 T 5x 3 
x 2 = 5 + 6 x 3 
x 3 is free 




x 2 is free 
x 3 is free 





< 




X[ = 5 T 3x 5 
x 2 — 1 T 4 x 5 
x 3 is free 
x 4 = 4 — 9x 5 
x 5 is free 


Note: The Study Guide discusses 
the common mistake x 3 = 0 . 


15. a. Consistent, with a unique solution 
b. Inconsistent 


17. h = 7/2 

19. a. Inconsistent when h = 2 and k ^ 8 

b. A unique solution when h ^ 2 

c. Many solutions when h = 2 and k = 8 

21. Read the text carefully, and write your answers before you 
consult the Study Guide. Remember, a statement is true 
only if it is true in all cases. 

23. Yes. The system is consistent because with three pivots, 
there must be a pivot in the third (bottom) row of the 
coefficient matrix. The reduced echelon form cannot 
contain a row of the form [0 0 0 0 0 1 ]. 

25. If the coefficient matrix has a pivot position in every row, 
then there is a pivot position in the bottom row, and there is 
no room for a pivot in the augmented column. So, the 
system is consistent, by Theorem 2. 


A17 




























A18 Answers to Odd-Numbered Exercises 


27. If a linear system is consistent, then the solution is unique if 
and only if every column in the coefficient matrix is a pivot 
column; otherwise, there are infinitely many solutions. 

29. An underdetermined system always has more variables than 
equations. There cannot be more basic variables than there 
are equations, so there must be at least one free variable. 
Such a variable may be assigned infinitely many different 
values. If the system is consistent, each different value of a 
free variable will produce a different solution. 


31. Yes, a system of linear equations with more equations than 
unknowns can be consistent. The following system has a 
solution (xi = x 2 = 1): 


X\ + x 2 — 2 
X\ — x 2 = 0 


3xi + 2x 2 = 5 


33. [M] p(t) = 7 + 6t - t 2 


Section 1.3, page 32 


1 

1_ 


"5" 

1 

5 

1 

_1 



6" 


"-3" 


r 

-1 

+ x 2 

4 

— 

-7 

5 


0 


-5 


6 x 1 


—3x 2 


r 


6 x 1 — 3x 2 


r 

-Xi 

+ 

4x 2 

— 

-7 

5 

— X\ + 4x 2 

— 

-7 

5xi 


0 _ 


-5 


5xi 


-5 


6 x 1 — 3x 2 = 1 

— X\ + 4x 2 — —7 

5xi = —5 

Usually the intermediate steps are not displayed. 

7. a = u — 2v, b = 2u — 2v, c = 2u — 3.5v, d = 3u — 4v 



0 


1 


5 


0 

9. X\ 

4 

+ x 2 

6 

+ X 3 

-1 

— 

0 


-1 


3 


-8 


0 


11. Yes, b is a linear combination of ai , a 2 , and a 3 . 

13. No, b is not a linear combination of the columns of A . 

15. Noninteger weights are acceptable, of course, but some 
simple choices are 0 • Vi + 0 • v 2 = 0, and 



7" 


-5 

1 • Vi + 0 • v 2 = 

1 

6_ 

, 0 • Vi + 1 • v 2 = 

1 

m 0 

_1 


2" 


“ 12" 

1 • Vl + 1 • v 2 = 

1 

_ 1 

, 1 • Vl - 1 • v 2 = 

-2 

-6 


17. h = -17 

19. Span {vi, v 2 } is the set of points on the line through \ x 
and 0. 


21. Hint: Show that 


is consistent for all h and 


2 2 h 

— Ilk 

k. Explain what this calculation shows about Span {u, v}. 


23. Before you consult your Study Guide , read the entire 
section carefully. Pay special attention to definitions and 
theorem statements, and note any remarks that precede or 
follow them. 


25. a. No, three 


b. Yes, infinitely many 


c. ai = 1 • ai + 0 • a 2 + 0 • a 3 


27. a. 
b. 


5vi is the output of 5 day’s operation of mine #1. 

■>—■—<11 


The total output is x\\\ + x 2 v 2 , so x\ and x 2 should 

150 
2825 


satisfy jxqvi + x 2 v 2 = 


c. [M] 1.5 days for mine #1 and 4 days for mine #2 
29. (1.3, .9,0) 


31. a. 


10/3 


b. Add 3.5 g at (0,1), add .5 g at (8,1), and add 2 g at 

(2,4). 

33. Review Practice Problem 1 and then write a solution. The 
Study Guide has a solution. 


Section 1.4, page 40 


1. The product is not defined because the number of columns 

(2) in the 3 x 2 matrix does not match the number of entries 

(3) in the vector. 



^0 

1 _ 

1 

0 " 


6" 


5 

3. Ax = 

-4 

-3 

z 

-3 

= 2- 

-4 

-3- 

-3 


7 

— 1 



7 


1 

_ 1 


" 12" 


— 15“ 


"-3" 


-8 

14 

+ 

9 

-18 


1 

-4 

, and 



6 

5" 

r 


6-2 + 5- (-3) 

Ax = 

-4 

7 

-3 

6 

-3 

— 

(-4) ■ 2 + (-3) ■ (-3) 

7-2 + 6- (-3) 



Show your work here and for Exercises 4-6, but 


thereafter perform the calculations mentally. 





































































Section 1.5 A19 








13. Yes. (Justify your answer.) 



u are here 


29. Hint: Start with any 3x3 matrix B in echelon form that 
has three pivot positions. 

31. Write your solution before you check the Study Guide. 

33. Hint: How many pivot columns does A have? Why? 

35. Given Axi = y 1 and Ax 2 = y 2 ,you are asked to show that 
the equation Ax = w has a solution, where w = y 1 + y 2 . 
Observe that w = Ax\ + Ax 2 and use Theorem 5(a) with x { 
and x 2 in place of u and v, respectively. That is, 
w = Axi + Ax 2 = A(x { + x 2 ). So the vector x = Xi + x 2 
is a solution of w = Ax. 

37. [M] The columns do not span IR 4 . 

39. [M] The columns span IR 4 . 

41. [M] Delete column 4 of the matrix in Exercise 39. It is also 
possible to delete column 3 instead of column 4. 


Section 1.5, page 48 


15. The equation Ax = b is not consistent when 3b\ + b 2 is 
nonzero. (Show your work.) The set of b for which the 
equation is consistent is a line through the origin—the set of 
all points (b\, b 2 ) satisfying b 2 = — 3b\. 

17. Only three rows contain a pivot position. The equation 
Ax = b does not have a solution for each b in IR 4 , by 
Theorem 4. 

19. The work in Exercise 17 shows that statement (d) in 

Theorem 4 is false. So all four statements in Theorem 4 are 
false. Thus, not all vectors in IR 4 can be written as a linear 
combination of the columns of A. Also, the columns of A 
do not span IR 4 . 


1. The system has a nontrivial solution because there is a free 
variable, x 3 . 

3. The system has a nontrivial solution because there is a free 
variable, x 3 . 





21. The matrix [vi v 2 v 3 ] does not have a pivot in each row, 
so the columns of the matrix do not span R 4 , by Theorem 4. 

9. x = x 2 

"3" 

1 

+ Xs 

1 

<N O 

1_ 

That is, {vi, v 2 , v 3 } does not span IR 4 . 


0 


1 


23. 


Read the text carefully and try to mark each exercise 
statement True or False before you consult the Study Guide 
Several parts of Exercises 23 and 24 are implications of the 
form 


44 


If (statement l),then (statement 2)” 


or equivalently, 

“(statement 2), if (statement 1)” 

Mark such an implication as True if (statement 2) is true in 
all cases when (statement 1) is true. 

25. C\ = —3, c 2 = — 1, c 3 = 2 


27. Qx = v, where Q = [q 1 q 2 q 3 ] and x = 


Note: If your answer is the equation Ax = b, you must 
specify what A and b are. 


11. Hint: The system derived from the reduced echelon form 
is 


Xi - 4x 2 


+ 5x6 = 0 
x 3 - x 6 = 0 

x 5 - 4 x 6 = 0 

0 = 0 


The basic variables are x\ , x 3 , and x 5 . The remaining 
variables are free. The Study Guide discusses two mistakes 
that are often made on this type of problem. 




5" 


4" 

Xi 

13. x = 

-2 

+ X 3 

-7 

*2 


0 


1 

_* 3 _ 






= p + x 3 q. Geometrically, the 


solution set is the line through 


5" 


4" 

-2 

0 

parallel to 

-7 

1 
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15. x = 


Xi 


~- 2 ~ 


5" 

x 2 

— 

1 

+ *3 

-2 

_ X 3_ 


0 


1 


. The solution set is the 


line through 


2 

1 

0 


, parallel to the line that is the solution 


set of the homogeneous system in Exercise 5. 



~-9~ 


"4" 


-2 


17. Fet u = 

1 

0 

, v = 

0 

1 

,P = 

1 

0 0 

_ 1 

. The solution of 


the homogeneous equation is x = x 2 u + x 3 v, the plane 
through the origin spanned by u and v. The solution set of 
the nonhomogeneous system is x = p + x 2 u + x 3 v, the 
plane through p parallel to the solution set of the 
homogeneous equation. 


19. x = a + tb, where t represents a parameter, or 


x = 


X\ 

*2 


-2 

0 


+ t 


-5 

3 


or 


x\ — —2 — 5t 
x 2 = 3 1 


21. x = p + / (q - p) = 

23. It is important to read the text carefully and write your 
answers. After that, check the Study Guide , if necessary. 



25. Ax] x — A (w — p) = Aw — Ap = b — b = 0 

27. When A is the 3x3 zero matrix, every x in R 3 satisfies 

Ax = 0. So the solution set is all vectors in R 3 . 

29. a. When A is a 3 x 3 matrix with three pivot positions, the 

equation Ax = 0 has no free variables and hence has no 
nontrivial solution. 

b. With three pivot positions, A has a pivot position in 
each of its three rows. By Theorem 4 in Section 1.4, the 
equation Ax = b has a solution for every possible b. 
The word “possible” in the exercise means that the only 
vectors considered in this case are those in R 3 , because 
A has three rows. 

31. a. When A is a 3 x 2 matrix with two pivot positions, each 

column is a pivot column. So the equation Ax = 0 has 
no free variables and hence no nontrivial solution. 

b. With two pivot positions and three rows, A cannot have 
a pivot in every row. So the equation Ax = b cannot 
have a solution for every possible b (in R 3 ), by 
Theorem 4 in Section 1.4. 


33. One answer: x = 



35. Your example should have the property that the sum of the 
entries in each row is zero. Why? 


37. One answer is A 




. The Study Guide shows how 


to analyze the problem in order to construct A . If b is any 
vector not a multiple of the first column of A , then the 
solution set of Ax = b is empty and thus cannot be formed 
by translating the solution set of Ax = b. This does not 
contradict Theorem 6, because that theorem applies when 
the equation Ax = b has a nonempty solution set. 


39. If c is a scalar, then A(c u) = cAu, by Theorem 5(b) in 
Section 1.4. If u satisfies Ax = 0, then Au = 0, 
cAu = c • 0 = 0, and so A(c u) = 0. 


Section 1.6, page 55 


1. The general solution is p Go ods = .875 p Services, with /^Services 

free. One equilibrium solution is /^services — 1000 and 
/?Goods = 875. Using fractions, the general solution could be 
written p Go ods = (7/8) /Services, and a natural choice of 
prices might be /Services = 80 and p Go ods = 70. Only the 
ratio of the prices is important. The economic equilibrium 
is unaffected by a proportional change in prices. 


3. a. 


Output 


Distribution of 
Output From: 
C&M F&P Mach 



.2 

.3 

.5 



.8 

.1 

.1 



.4 

.4 

.2 


Input Purchased By: 
-* C&M 
-* F&P 
-> Aach. 




[M] /^Chemicals — 141.7, /?Fuels — 91.7, /^Machinery — 100. 

To two significant figures, /Chemicals = 140, p Fmh = 92, 


/^Machinery 1 bb • 


5. B 2 S 3 + 6H 2 0 -> 2H 3 B0 3 + 3H 2 S 


7. 3NaHC0 3 + H 3 C 6 H 5 0 7 -> Na 3 C 6 H 5 0 7 + 3H 2 0 + 3C0 2 

9. [M] 15PbN 6 + 44CrMn 2 0 8 -> 

5Pb 3 0 4 + 22Cr 2 0 3 + 88Mn0 2 + 90NO 



X| = 20 — x 3 
x 2 — 60 4- x 3 
x 3 is free 
x 4 = 60 


The largest value of x 3 is 20. 


13. a. 


X\ 

— 

x 3 - 

40 




x 2 

— 

X 3 + 

10 


r X 2 = 

= 50 

x 3 

is 

free 


b < 

X 3 = 

= 40 

x 4 

— 

X6 + 

50 

wJ • s 

x 4 = 

= 50 

*5 

— 

X6 + 

60 


x 5 = 

= 60 

x 6 

is 

free 






Section 1.7, page 61 

Justify your answers to Exercises 1-22. 


1. Fin. indep. 3. Fin. depen. 

5. Fin. indep. 7. Fin. depen. 

9. a. No h b. All h 
11. h = 6 13. All h 

15. Fin. depen. 17. Fin. depen. 19. Fin. indep. 

21. If you consult your Study Guide before you make a good 
effort to answer the true-false questions, you will destroy 
most of their value. 






































Section 1.8 


A21 



27. All five columns of the 7x5 matrix A must be pivot 
columns. Otherwise, the equation Ax = 0 would have a 
free variable, in which case the columns of A would be 
linearly dependent. 


29. A: Any 3 x 2 matrix with two nonzero columns such that 
neither column is a multiple of the other. In this case, the 
columns are linearly independent, and so the equation 
Ax = 0 has only the trivial solution. 

B : Any 3x2 matrix with one column a multiple of the 
other. 


31. x = 



33. True, by Theorem 7. (The Study Guide adds another 
justification.) 


35. False. The vector Vi could be the zero vector. 


37. True. A linear dependence relation among Vi, v 2 , v 3 may be 
extended to a linear dependence relation among Vi, v 2 , v 3 , 
v 4 by placing a zero weight on v 4 . 


39. You should be able to work this important problem without 
help. Write your solution before you consult the Study 
Guide. 


41. [M] B = 


00 

-3 

2 

9 

4 

-7 

6 

-2 

4 

5 

-1 

10 


. Other choices are possible. 


43. [M] Each column of A that is not a column of B is in the 
set spanned by the columns of B. 


Section 1.8, page 69 



, unique solution 


5. x = 



, not unique 


7. a = 5 , b = 6 




" 9 " 


~-7~ 


4 

+ x 4 

-3 

x — x 3 

1 

0 


0 


1 


11. Yes, because the system represented by [A b] is 
consistent. 



A reflection through the origin 





• u 




1 


A projection onto the x 2 -axis. 

2 x\ — x 2 

5xi + 6x 2 

21. Read the text carefully and write your answers before you 
check the Study Guide. Notice that Exercise 21(e) is a 
sentence of the form 



“(statement 1) if and only if (statement 2)” 

Mark such a sentence as True if (statement 1) is true 
whenever (statement 2) is true and also (statement 2) is true 
whenever (statement 1) is true. 




25. Hint: Show that the image of a line (that is, the set of 
images of all points on a line) can be represented by the 
parametric equation of a line. 

27. a. The line through p and q is parallel to q — p. (See 

Exercises 21 and 22 in Section 1.5.) Since p is on the 
line, the equation of the line is x = p + t( q-p) 
Rewrite this as x = p — £p + tq and x = (1 — t) p + t q. 

b. Consider x = (1 — t )p + tq for t such that 0 < t < 1. 
Then, by linearity of T, for 0 < t < 1 
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Answers to Odd-Numbered Exercises 


m = rcci - Op + rq) = (i - 0^(p) + tT( q) (*) 

If T (p) and T (q) are distinct, then (*) is the equation 
for the line segment between T (p) and T (q), as shown 
in part (a). Otherwise, the set of images is just the single 
point T (p) , because 

(1 - t)T( p) + tT( q) = (1 - 07Xp) + tT( p) = T( p) 


29. a. When b = 0, /(x) = mx. In this case, for all x, y in R 

and all scalars c and d , 


/ (cx + dy) = m(cx + dy) = mex + mdy 

= c(mx) + d{my) = c • /(x) + d • /(y) 



This shows that / is linear. 

b. When /(x) = mx + b, with b nonzero, 

/(0) = m(0) + b = b 0. 

c. In calculus, / is called a “linear function” because the 
graph of / is a line. 


23. Answer the questions before checking the Study Guide. 
Justify your answers to Exercises 25-28. 

25. Not one-to-one and does not map R 4 onto M 4 
27. Not one-to-one but maps R 3 onto R 2 


31. Hint: Since {vi, v 2 , v 3 } is linearly dependent, you can write 
a certain equation and work with it. 

33. One possibility is to show that T does not map the zero 
vector into the zero vector, something that every linear 
transformation does do: T (0,0) = (0, 4, 0). 





0 

0 


* 

* 




31. n . (Explain why, and then check the Study Guide). 


35. Take u and v in R 3 and let c and d be scalars. Then 


cu + dx = (cu i + dv\,cu 2 + dv 2 , cu 3 + dv 3 ) 
The transformation T is linear because 


T(c u + dx) = ( cu\ + dv\,cu 2 + dv 2 , ~{cu 3 + dv 3 )) 

= (cu i + dv\,cu 2 + Ju 2 > —'cw 3 — Ju 3 ) 

= (cu\, cu 2 , — cu 3 ) + (dv\, dv 2 , — dv 3 ) 

= c(u u u 2 ,-u 3 ) + d(v i,v 2 ,-v 3 ) 

= c7(u) + T (v) 

37. [M] All multiples of (7, 9,0,2) 

39. [M] Yes. One choice for x is (4,7,1,0). 


Section 1.9, page 79 

"3 -5 

1 1 2 

* 3 0 

1 0 


■- 1 /V 2 

1 /V 2 ' 

9. 

0 

- 1 " 

1 /V 2 

1 /V 2 . 

-1 

2 


11. The described transformation T maps ei into —ei and maps 
e 2 into — e 2 . A rotation through n radians also maps ei into 
— ei and maps e? into — e 2 . Since a linear transformation is 
completely determined by what it does to the columns of 
the identity matrix, the rotation transformation has the same 
effect as T on every vector in R 2 . 



33. Hint: If e y is the j th column of I n , then Bej is the j th 
column of B. 

35. Hint: Is it possible that m > nl What about m < nl 
37. [M] No. (Explain why.) 

39. [M] No. (Explain why.) 


Section 1.10, page 87 




"110" 


"130" 


"295" 


4 

+ x 2 

3 


9 

a. x\ 

20 

18 

— 

48 


2 


5 


8 


, where x\ is the 


number of servings of Cheerios and x 2 is the number of 
servings of 100% Natural Cereal. 



110 

4 

20 

2 


130" 



"295 

3 


Xi 


9 

18 


*2 


48 

5 



8 


. Mix 1.5 servings of 


Cheerios together with 1 serving of 100% Natural 
Cereal. 


3. a. She should mix .99 serving of Mac and Cheese, 1.54 
servings of broccoli, and .79 serving of chicken to get 
her desired nutritional content. 

b. She should mix 1.09 servings of shells and white 
cheddar, .88 serving of broccoli, and 1.03 servings of 
chicken to get her desired nutritional content. Notice 
that this mix contains significantly less broccoli, so she 
should like it better. 
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5. Ri = v, 


[M]: i = 


7. Ri = \, 


[M]: i = 


11 

-5 

0 0 

-5 

10 

-1 0 

0 

-1 

9 -2 

0 

0 

-2 10 

~h~ 


3.68" 

h 


-1.90 

h 


2.57 

_h_ 


_ —2.49 _ 

12 

-7 

0 - 

-7 

15 

-6 

0 

-6 

14 - 

-4 

0 

-5 1 

~h~ 


"11.43" 

h 


10.55 

h 


8.04 

u 


5.84 



~h~ 


50“ 


h 


-40 


h 


30 


h 


-30 


0 



~/r 


40" 


h 


30 


h 


20 


U 


-10 


9. x^+i = Mxk for k = 0,1,2,..., where 


M = 


93 

07 


05 

95 


and x 0 = 


800,000 

500,000 


The population in 2017 (for k = 2) is x 2 = 


741,720 

558,280 


11. a. M = 


98033 

01967 


.00179 

.99821 


b. [M] x 10 = 


35.729 

278.18 


13. [M] 


a. The population of the city decreases. After 7 years, the 
populations are about equal, but the city population 
continues to decline. After 20 years, there are only 
417,000 persons in the city (417,456 rounded off). 
However, the changes in population seem to grow 
smaller each year. 

b. The city population is increasing slowly, and the 
suburban population is decreasing. After 20 years, the 
city population has grown from 350,000 to about 
370,000. 


Chapter l Supplementary Exercises, page 89 


1. 


a. F 

b. F 

c. T 

d. F 

e. T 

f. T 

§• F 

h. F 

i. T 

j. F 

k. T 

1. F 

m. T 

n. T 

0 . T 

p. T 

q. F 

r. T 

s. F 

t. T 

u. F 

v. F 

w. T 

x. T 

y. t 


z. F 

3. a. Any consistent linear system whose echelon form is 


■ 

* 

* 

* 


■ 

* 

* 

* 

0 

■ 

* 

* 

or 

0 

0 

■ 

* 

_ 0 

0 

0 

0 _ 


_ 0 

0 

0 

0 _ 


or 


0 

0 

0 


* 


0 

0 


0 


* 

* 

0 


b. Any consistent linear system whose reduced echelon 
form is / 3 . 


9 


10 


11 


c. Any inconsistent linear system of three equations in 
three variables. 


5. a 


b 


The solution set: (i) is empty if h = 12 and k ^ 2; (ii) 
contains a unique solution if h / 12; (iii) contains 
infinitely many solutions if h = 12 and k — 2. 

The solution set is empty if k + 3h = 0; otherwise, the 
solution set contains a unique solution. 


7. a. Set Vi = 


b\ 

b = b 2 

. ^3 

Solution: No. 


2 " 


" -4 “ 


" -2 “ 

-5 

7 

,v 2 = 

1 

-5 

, v 3 = 

1 

-3 


, and 


. “Determine if Vi , v 2 , v 3 span IR 3 .” 


b. Set A = 


-5 


-4 

-2 

1 

1 

-5 

-3 


. “Determine if the 



3 » 


columns of A span 
Define T(x) = Ax. “Determine if T maps IR 3 onto IR 3 .” 


5 

4 

2 

7 

1 


5 

6 

— _ 

1 

— 


or 

6 

3 

3 

2 


8/3 

4/3 


+ 


7/3 

14/3 


Hint: Construct a “grid” on the xix 2 -plane determined by 
ai and a 2 . 

A solution set is a line when the system has one free 
variable. If the coefficient matrix is 2 x 3, then two of the 
columns should be pivot columns. For instance, take 
1 2 

q 3 ^ . Put anything in column 3. The resulting 

matrix will be in echelon form. Make one row replacement 
operation on the second row to create a matrix not in 

echelon form, such as 


' 1 

2 

1 ' 


' 1 

2 

1 ' 

1- 

O 

3 

1 


1 

5 

2 


12. Hint: How many free variables are in the equation Ax = 0? 


13. E = 


1 0 
0 1 
0 0 


-3 

2 

0 


15. a. If the three vectors are linearly independent, then a, c, 

and / must all be nonzero. 

b. The numbers a ,..., / can have any values. 

16. Hint: List the columns from right to left as Vi,..., v 4 . 

17. Hint: Use Theorem 7. 

19. Let M be the line through the origin that is parallel to the 
line through y 1 , v 2 , and v 3 . Then v 2 — y 1 and v 3 — v 1 are 
both on M . So one of these two vectors is a multiple of the 
other, say v 2 — Vi = k(v 3 — Vi). This equation produces a 
linear dependence relation: (k — l)vi + v 2 — kv 3 = 0 . 

A second solution: A parametric equation of the line is 
x = Vi + t(\ 2 — Vi). Since v 3 is on the line, there is some to 
such that v 3 = Vi + to(y 2 — Vi) = (1 — to)\\ + t 0 v 2 . So v 3 
is a linear combination of Vi and v 2 , and {vi, v 2 , v 3 } is 
linearly dependent. 
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21 


1 

0 

0 


0 

-1 

0 


0 

0 

1 


23. a = 4/5 and b = —3/5 


25. a 


The vector lists the number of three-, two-, and 
one-bedroom apartments provided when x\ floors of 
plan A are constructed. 



" 3 " 


" 4 " 


" 5 " 

b. x\ 

7 

8 

+ X 2 

4 

8 

+ x$ 

3 

9 


[M] Use 2 floors of plan A and 15 floors of plan B. Or, 
use 6 floors of plan A, 2 floors of plan B, and 8 floors of 
plan C. These are the only feasible solutions. There are 
other mathematical solutions, but they require a 
negative number of floors of one or two of the plans, 
which makes no physical sense. 


Chapter 2 

Section 2.1, page 102 


1. 


3. 


4 
8 

1 

7 

1 

5 


0 

10 

13 

-6 

1 
5 


2 

-4 



3 

-5 

_1 

5 

_ -7 

6 

-7 _ 


, not defined, 


12 

15 


-3 

-6 



" -7 “ 


1 

^t - 

1_ 

5. a. Ab\ = 

7 

12 

, Ab 2 = 

-6 

_ -7 _ 


AB = 


-7 

7 

12 


4 

-6 

-7 


b. AB = 


— 1*3 + 2(—2) 
5 • 3 + 4 (—2) 
2*3 — 3 (—2) 


— 1 (— 2 ) + 2-1 
5 (—2) + 4-1 
2(—2) -3-1 


1. 3x1 


' -7 
7 

12 

9. k = 5 


4 

-6 


19. The third column of A B is the sum of the first two columns 
of AB. Here’s why. Write B = [ bi b 2 b 3 ].By 
definition, the third column of AB is Ab 3 . If b 3 = bi + b 2 , 
then Ab 3 = A(bi + b 2 ) = Ab\ + Ab 2 , by a property of 
matrix-vector multiplication. 

21. The columns of A are linearly dependent. Why? 


23 


25 


27 


29 


31 


33 


35 


Hint: Suppose x satisfies Ax = 0, and show that x must 
be 0 . 

Hint: Use the results of Exercises 23 and 24, and apply the 
associative law of multiplication to the product CAD. 


T T 

UV = V u = 

—2a + 3b 

- 4c, 


—2a 

—2b 

—2c 

T 

uv = 

3 a 

3b 

3c 


—4 a 

—4 b 

—4c 


—2a 

3 a 

—4a 

T 

YU = 

—2b 

3b 

—4b 


—2c 

3c 

—4c 


Hint: For Theorem 2(b), show that the (/, j )-entry of 
A(B + C) equals the (/, y)-entry of AB + AC. 

Hint: Use the definition of the product I m A and the fact that 
I m x = x for x in IR W . 

Hint: First write the (/, y)-entry of (AB) T , which is the 
(j, i) -entry of AB . Then, to compute the (/, j )-entry in 
B T A T , use the facts that the entries in row i of B T are 
bu, ... ,b ni , because they come from column i of B, and 
the entries in column j of A T are aj\ ,. .., a jn , because they 
come from row j of A. 

[M] The answer here depends on the choice of matrix 
program. For MATFAB, use the help command to read 
about zeros, ones, eye, and diag. 


37. [M] Display your results and report your conclusions. 

39. [M] The matrix S “shifts” the entries in a vector 

(< a , b, c, d, e) to yield ( b , c, d, e, 0). S 5 is the 5x5 zero 
matrix. So is S 6 . 


Section 2.2, page 111 


-/ 


2 

-3 " 

1 

" -5 

-5 " 

5 

1. 

.- 5/2 

4 

3. — 

5 

7 

8 _ 


or 



~ 2 

3 

5 " 


~ 2 

2 

2 ~ 


1 1 

11. AD = 

2 

6 

15 

, DA = 

3 

6 

9 


- 7/5 - 8/5 


2 

12 

25 


5 

20 

25 

5. Xi = 1 and x 2 = — 


-9 


Right-multiplication (that is, multiplication on the right) by 
D multiplies each column of A by the corresponding 
diagonal entry of D . Feft-multiplication by D multiplies 
each row of A by the corresponding diagonal entry of D. 
The Study Guide tells how to make AB = BA, but you 
should try this yourself before looking there. 

13. Hint: One of the two matrices is Q. 

15. Answer the questions before looking in the Study Guide. 


7. a and b: 


9 

4 


11 

-5 


6 

2 


, and 


13 

-5 


9. Write out your answers before checking the Study Guide. 


11 

13 


17. bi = 


7 

4 


b? = 


-8 

-5 


The proof can be modeled after the proof of Theorem 5. 

AB = AC ^ A~ l AB = A~ l AC => IB = IC 

B = C. No, in general, B and C can be different when A 
is not invertible. See Exercise 10 in Section 2.1. 

15. D = C~ l B~ l A~ l . Show that D works. 


17. A = BCB 


-1 
























































Section 2.3 A25 


19. After you find X = CB — A, show that A is a solution. 

21. Hint: Consider the equation Ax = 0. 

23. Hint: If Ax = 0 has only the trivial solution, then there are 
no free variables in the equation Ax = 0, and each column 
of A is a pivot column. 


25. Hint: Consider the case a = b = 0. Then consider the 

—b 

, and use the fact that ad — be = 0. 


vector 


a 


27. Hint: For part (a), interchange A and B in the box 

following Example 6 in Section 2.1, and then replace B by 
the identity matrix. For parts (b) and (c), begin by writing 



rowi(A) 

row 2 (A) 

row 3 (A) 



" -7 2 " 

4 -1 


8 

3 

1 

29. 

31. 

10 

4 

1 



7/2 

3/2 

1/2 _ 


33. A- 1 = B = 


1 

1 

0 


0 

1 

1 


0 

0 

1 


0 

0 


. Hint: For 


0 0 -1 1 _ 
j = 1 ,...,«, let a y -, by, and e 7 denote the j th columns of 
A, B , and /, respectively. Use the facts that a 7 — a ; -+i = 
e y and by = ey — e 7 - + 1 for j = 1— 1, and 
a n b n e n . 




. Find this by row reducing [ A 



37. C = 



39. .27, .30, and .23 inch, respectively 

41. [M] 12,1.5,21.5, and 12 newtons, respectively 


Section 2.3, page 117 

The abbreviation IMT (here and in the Study Guide) denotes the 
Invertible Matrix Theorem (Theorem 8). 


1. Invertible, by the IMT. Neither column of the matrix is a 
multiple of the other column, so they are linearly 
independent. Also, the matrix is invertible by Theorem 4 in 
Section 2.2 because the determinant is nonzero. 



Invertible, by the IMT. The matrix row reduces to 



and has 3 pivot positions. 



Not invertible, by the IMT. The matrix row reduces to 


1 

0 

0 


0 

3 

0 



and is not row equivalent to / 3 . 



Invertible, by the IMT. The matrix row reduces to 


- —-— 7 

" -1 

-3 

0 

1 " 


0 

-4 

8 

0 

and has four pivot positions. 

0 

0 

3 

0 

0 

0 

0 

1 



9. [M] The 4 x 4 matrix has four pivot positions, so it is 
invertible by the IMT. 


11. The Study Guide will help, but first try to answer the 
questions based on your careful reading of the text. 


13. A square upper triangular matrix is invertible if and only if 
all the entries on the diagonal are nonzero. Why? 


Note: The answers below for Exercises 15-29 mention the IMT. 
In many cases, part or all of an acceptable answer could also be 
based on results that were used to establish the IMT. 


15. 

17. 

19. 

21 . 


23. 

25. 

27. 






If A has two identical columns then its columns are linearly 
dependent. Part (e) of the IMT shows that A cannot be 
invertible. 

If A is invertible, so is A ~ l , by Theorem 6 in Section 2.2. 
By (e) of the IMT applied to A -1 , the columns of A -1 are 
linearly independent. 

By (e) of the IMT, D is invertible. Thus the equation 
Dx = b has a solution for each b in R 7 , by (g) of the IMT. 
Can you say more? 

The matrix G cannot be invertible, by Theorem 5 in Section 
2.2 or by the paragraph following the IMT. So (g) of the 
IMT is false and so is (h). The columns of G do not 
span R n . 

Statement (b) of the IMT is false for K , so statements (e) 
and (h) are also false. That is, the columns of K are linearly 
dependent and the columns do not span R n . 

Hint: Use the IMT first. 


Fet W be the inverse of AB. Then ABW = I and 
A (B W) = I . Unfortunately, this equation by itself does 
not prove that A is invertible. Why not? Finish the proof 
before you check the Study Guide. 

Since the transformation x i-> Ax is not one-to-one, 
statement (f) of the IMT is false. Then (i) is also false and 
the transformation x 1 Ax does not map R" onto R”. Also, 
A is not invertible, which implies that the transformation 
x i-> Ax is not invertible, by Theorem 9. 

Hint: If the equation Ax = b has a solution for each b, then 
A has a pivot in each row (Theorem 4 in Section 1.4). 

Could there be free variables in an equation Ax = b? 


Hint: First show that the standard matrix of T is invertible. 
Then use a theorem or theorems to show that 

T~\x) = Bx, where B = ^ \ 

w 4 5 


Hint: To show that T is one-to-one, suppose that 
T(u) = T (v) for some vectors u and v in R". Deduce 
that u = v. To show that T is onto, suppose y represents 
an arbitrary vector in R" and use the inverse S to produce 
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an x such that T(x) = y. A second proof can be given 
using Theorem 9 together with a theorem from 
Section 1.9. 

37. Hint: Consider the standard matrices of T and U. 

39. Given any v in R n , we may write v = T(x) for some x, 
because T is an onto mapping. Then, the assumed 
properties of S and U show that S(y) = S(T(x)) = 
x and U(\) = U(T(x)) = x. So S(y) and U(\) are equal 
for each v. That is, S and U are the same function from W 
into R". 



An 

An 


A 12 

A 22 


I 0 
_ A 2 \A n l I 

with S = An - 


An 0 

|_ 0 

A2lA 11 1 Ai2. 


I 

0 




Gk +i = [Xk xjt+i] 


Y t 

A k 

4 +1 j 


= X k X[ + x k+l x 


= G k + x k+1 xl +l 
Only the outer product matrix x*+ixj, , needs to be 
computed (and then added to G^). 


T 

k-\-l 


41. [M] a. The exact solution of (3) is x\ = 3.94 and 

x 2 = .49. The exact solution of (4) is x\ =2.90 and 
x 2 = 2.00. 

b. When the solution of (4) is used as an approximation for 
the solution in (3), the error in using the value of 2.90 
for x\ is about 26%, and the error in using 2.0 for x 2 is 
about 308%. 

c. The condition number of the coefficient matrix is 3363. 
The percentage change in the solution from (3) to (4) is 
about 7700 times the percentage change in the right side 
of the equation. This is the same order of magnitude as 
the condition number. The condition number gives a 
rough measure of how sensitive the solution of Ax = b 
can be to changes in b. Further information about the 
condition number is given at the end of Chapter 6 and in 
Chapter 7. 

43. [M] cond(A) % 69,000, which is between 10 4 and 10 5 . So 
about 4 or 5 digits of accuracy may be lost. Several 
experiments with MATLAB should verify that x and xi 
agree to 11 or 12 digits. 

45. [M] Some versions of MATLAB issue a warning when 
asked to invert a Hilbert matrix of order about 12 or larger 
using floating-point arithmetic. The product A A -1 should 
have several off-diagonal entries that are far from being 
zero. If not, try a larger matrix. 

Section 2.4, page 123 


A 

B 

3. 

" Y 

Z " 

EA + C 

EB + D 

_ w 

X 


5. Y = B~ l (explain why), X = -B~ l A, Z = C 
7. X = A- 1 (why?), Y = -BA~\Z = 0 (why?) 

9. X — —A 2 iA n 1 , Y = —A 3 iA n l , B 2 2 = A 2 2 — A 2 \A n l An 

11. You can check your answers in the Study Guide. 


19. W(s) = I m - C(A - sl n )- x B. This is the Schur 
complement of A — sl n in the system matrix. 


21. a. A 2 


b. M 2 


1 

1 

O 

" 1 

1- 

O 

3 

-1 

1- 

-1 


1 + 0 

0 + 0 


" 1 

0 " 

3-3 

0 + (- 1) 2 _ 


0 

1 


" A 

0 " 

" A 

0 " 

I 

-A 

/ 

-A 


" A 2 + 0 

0 + 0 


" / 

0 " 

A-A 

0 + ( -A) 2 _ 


0 

I 


23. If A 1 and B 1 are (k + 1) x (k + 1) and lower triangular, 

a 0 T 
v A 


then we can write A\ = 


and 


Bi = 


b 


0 


T 


, where A and B are k x k and lower 


w B 

triangular, v and w are in R k , and a and b are suitable 
scalars. Assume that the product of k x k lower triangular 
matrices is lower triangular, and compute the product A\B\ 
What do you conclude? 


B = 


25. Use Example 5 to find the inverse of a matrix of the form 

£11 0 

0 B22 

B is invertible. Partition the matrix A, and apply your result 
twice to find that 


, where Bn is p x p, B 2 2 is q x q and 




0 0 0 

0 0 0 

1/2 0 0 

0 3-4 

0 -5/2 7/2 


27. a, b. [M] The commands to be used in these exercises 

will depend on the matrix program, 
c. The algebra needed comes from the block matrix 
equation 


13. Hint: Suppose A is invertible, and let A 1 = 


1 _ 

" D 

E ' 


An 

0 

" X! " 


'b! ' 


F 

G 

• 

_ A 2 i 

An 

_ X 2 _ 


. b 2 


C are invertible. (Explain why!) Conversely, suppose B 
and C are invertible. To prove that A is invertible, guess 
what A -1 must be and check that it works. 


where Xi and bi are in IR 20 and x 2 and b 2 are in IR 30 . 
Then AnXi = bi, which can be solved to produce Xi 
The equation A 2 iXi + A 22 x 2 = b 2 yields 
















































Section 2.6 A27 


^ 22 X 2 — t >2 — A 2 \X \, which can be solved for x 2 by row 
reducing the matrix [v4 2 2 c] , where c = b 2 — A 2 iXi . 


Section 2.5, page 131 


1. Ly = b^y = 


9 


11 


13 


15. 



1 

1 _ 


1 

1_ 

3. y = 

3 

,X = 

3 


1 

LO 

1_ 


1 

1_ 


5. y = 


1 

_1 


1 

<N 

1_ 

5 


-1 

1 

,x = 

2 

-3 


-3 


7. LU = 


1 

0 " 

" 2 

5 

- 3/2 

1 

0 

7/2 


1 

1 

3 

1 

2 

1/3 

1 

1 

4 
2 

1 

3 

1/2 


0 

1 

2/3 

0 

1 

1 

0 

1 

5 

-1 


0 

0 

1 

0 

0 

1 


3 

0 

0 

3 

0 

0 


-1 

-3 

0 

-6 

5 

0 


2 

12 

-8 

3 

-4 

5 


17. U~ l = 


L~ l = 


4T 1 = 


0 

1 

-2 

1/4 

0 

0 

1 

1 

-2 

1/8 

-3/2 

-1 


0 0 
0 0 
1 0 
0 1 


° ir 2 
0 0 

1 _IL 0 

3/8 

- 1/2 

0 

0 0 “ 

1 0 

0 1 _ 

3/8 
- 1/2 
0 


1 

0 

0 

0 


3 

2 

0 

0 


-5 

3 

0 

0 


-3 

1 

0 

0 


4 

3 

0 


4 

-5 

0 


-2 

3 

5 


1/4 

1/2 

1/2 


1/4 

1/2 

1/2 


19. Hint: Think about row reducing [ A I ]. 

21. Hint: Represent the row operations by a sequence of 
elementary matrices. 

23. a. Denote the rows of D as transposes of column vectors 

Then partitioned matrix multiplication yields 


r vf n 


A = CD = [d 


c 4 ] 




T 

L M J 

T 


= civ; 4 -b c 4 v; 

b. A has 40,000 entries. Since C has 1600 entries and D 
has 400 entries, together they occupy only 5% of the 
memory needed to store A. 


25. Explain why U , D , and V T are invertible. Then use a 
theorem on the inverse of a product of invertible matrices 


27. a. 


" -7 " 



3 " 

v i 

A— 

-2 

,Ux = y = 

^ X = 

4 

6 



-6 

w 


J 1 

— 

AAA 


1 

vVV 

1/2 ohm 

V 


_ 



. J 2 

— 

A 

2 


> 9/2 
>ohms 

V 


_ 



b. 



3/4 ohm 


v 


3 


29. a. 


1 + R 2 /R 1 

-1 /Ri - R 2 /{R v R 3 ) 


-1 /R 


-Ri 

1 + r 2 /r 2 


b. A = 


—1 

1 — 

0 

— 1 

-12 ■ 

1 

1 

O 

1- 

— 1 

1 — 

0 

— 1 

_ -1/36 

1 


V 



* j 

r 

* ^ 1 

1 

1 4 
1 „ 

1 ' - 
1 

1 

1 1 ^-* 

1 ohms I V 

1 

1 

L. - 

1 


> i 2 

— 

A 

2 

vVV 

12 ohms 

V 


_ 



l 


V 


31. [M] 


a. l — 


1 

-.25 

-.25 

0 

0 

0 

0 

0 


u = 


4 

0 

0 

0 

0 

0 

0 

0 


-1 

3.75 

0 

0 

0 

0 

0 

0 


0 

1 

.0667 

.2667 

0 

0 

0 

0 

-1 

-.25 

3.7333 

0 

0 

0 

0 

0 


0 
0 

1 

.2857 
.2679 
0 
0 
0 

0 

-1 

- 1.0667 

3.4286 

0 

0 

0 

0 


0 

0 

0 

1 

-.0833 

-.2917 

0 

0 


0 

0 

0 

0 

1 

.2921 

.2697 

0 


0 

0 

0 

0 

0 

1 

-.0861 

-.2948 


0 

0 

0 

0 

0 

0 

1 

.2931 


0 

0 

0 

0 

0 

0 

0 

1 


0 

0 

-1 

-.2857 

3.7083 

0 

0 

0 


0 

0 

0 

-1 

- 1.0833 

3.3919 

0 

0 


0 

0 

0 

0 

-1 

-.2921 

3.7052 

0 


0 

0 

0 

0 

0 

-1 

- 1.0861 

3.3868 


b. x = (3.9569,6.5885,4.2392,7.3971,5.6029,8.7608,9.4115,12.0431) 


A - 1 = 


- .2953 

.0866 

.0945 

.0509 

.0318 

.0227 

.0100 

.0082 - 

.0866 

.2953 

.0509 

.0945 

.0227 

.0318 

.0082 

.0100 

.0945 

.0509 

.3271 

.1093 

.1045 

.0591 

.0318 

.0227 

.0509 

.0945 

.1093 

.3271 

.0591 

.1045 

.0227 

.0318 

.0318 

.0227 

.1045 

.0591 

.3271 

.1093 

.0945 

.0509 

mu 

.0318 

.0591 

.1045 

.1093 

.3271 

.0509 

.0945 

.0100 

.0082 

.0318 

.0227 

.0945 

.0509 

.2953 

.0866 

_ .0082 

.0100 

.0227 

.0318 

.0509 

.0945 

.0866 

.2953 _ 


Obtain A 1 directly and then compute A 1 — U 1 L 1 
to compare the two methods for inverting a matrix. 


Section 2.6, page 138 


1. C = 


10 

30 

30 


.60 

.20 

.10 


.60 

0 

.10 


j intermediate 
’ ( demand 


60 

20 

10 
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40 

15 

15 


5. x = 


110 

120 


7. a. 


1.6 

1.2 


111.6 

121.2 


15. 

19. 


(12,-6, 3) 



The triangle with vertices at 


0 

1/2 

V3/2 

0 

(7,2,0), 


0 0 “ 

— a/3/2 0 

1/2 0 

0 1 _ 

(7.5, 5,0), (5, 5,0) 


9. x = 


82.8 

131.0 

110.3 



2.2586 

-1.0395 

-.3473 " 


~ X ~ 


" R ~ 

21. [M] 

-1.3495 

2.3441 

.0696 


Y 

— 

G 


.0910 

-.3046 

1.2777 


Z 


B 


11. Hint: Use properties of transposes to obtain 
p r = p T C + v r , so that p r x = (p r C + v r )x = 
p r Cx + v r x. Now compute p r x from the production 
equation. 

13. [M] x = (99576, 97703, 51231,131570,49488, 329554, 
13835). The entries in x suggest more precision in the 
answer than is warranted by the entries in d, which appear 
to be accurate only to perhaps the nearest thousand. So a 
more realistic answer for x might be 
x = 1000 x (100, 98, 51,132,49, 330,14). 

15. [M] x (12) is the first vector whose entries are accurate to the 
nearest thousand. The calculation of x (12) takes about 
1260 flops, while row reduction of [ (/ — C) d] takes 
only about 550 flops. If C is larger than 20 x 20, then fewer 
flops are needed to compute x (12) by iteration than to 
compute the equilibrium vector x by row reduction. As the 
size of C grows, the advantage of the iterative method 
increases. Also, because C becomes more sparse for larger 
models of the economy, fewer iterations are needed for 
reasonable accuracy. 

Section 2.7, page 146 


Section 2.8, page 153 

1. The set is closed under sums but not under multiplication 
by a negative scalar. (Sketch an example.) 

3. The set is not closed under sums or scalar multiples. The 
subset consisting of the points on the line x 2 = x\ is a 
subspace, so any “counterexample” must use at least one 
point not on this line. 

5. No. The system corresponding to [ Vi v 2 w ] is 
inconsistent. 

7. a. The three vectors y 1 , y 2 , and v 3 

b. Infinitely many vectors 

c. Yes, because Ax = p has a solution. 

9. No, because Ap / 0. 

11 . p = 4 and q = 3. Nul A is a subspace of R 4 because 
solutions of Ax = 0 must have four entries, to match the 
columns of A. Col A is a subspace of R 3 because each 
column vector has three entries. 

13. For Nul A, choose (1, —2,1,0) or (—1,4,0,1), for 
example. For Col A, select any column of A. 



3. 


V2/2 

V 2/2 

0 


-a/2/2 
a/2/2 
0 


V2 

2 V2 

1 




" a/3/2 

1/2 

0 " 



5. 

1/2 

-a/3/2 

0 


17. 


0 

0 

1 _ 




1/2 

-a/3/2 

3 + 4 a/3 


7. 

a/3/2 

1/2 

4- 

-3 a/3 

19. 


0 

0 


1 



See the Practice Problem. 


9. A(BD ) requires 1600 multiplications. ( AB)D requires 808 
multiplications. The first method uses about twice as many 
multiplications. If D had 20,000 columns, the counts would 
be 160,000 and 80,008, respectively. 


11. Use the fact that 
sec cp — tan (p sin cp = 


1 


sin 2 <p 


23 


COS (p cos <p 


= COS (p 


-1 

_1 


1 

a 

1_ 


1 

0 

1_ 

1- 

O 

1_ 


1 

0 

_1 


1 

b-< 

O 

_1 


. First apply the 


linear transformation A, and then translate by p. 


Yes. Let A be the matrix whose columns are the vectors 
given. Then A is invertible because its determinant is 
nonzero, and so its columns form a basis for M 2 , by the IMT 
(or by Example 5). (Other reasons for the invertibility of A 
could be given.) 

Yes. Let A be the matrix whose columns are the vectors 
given. Row reduction shows three pivots, so A is invertible. 
By the IMT, the columns of A form a basis for R 3 . 

No. Let A be the 3x2 matrix whose columns are the 
vectors given. The columns of A cannot possibly span R 3 
because A cannot have a pivot in every row. So the columns 
are not a basis for R 3 . (They are a basis for a plane in R 3 .) 


Read the section carefully, and write your answers before 
checking the Study Guide. This section has terms and key 
concepts that you must learn now before going on. 


Basis for Col A: 


1 

1_ 


1 

_1 

6 

5 

5 

1 

LO 

1_ 


1 

_1 


1_ 


" -7 " 

-5 


6 

1 

5 

0 

1 

0 

1_ 


1 


13. 


Basis for Nul A: 












































Section 2.9 A29 



Basis for Col A: 


1 " 


1 

1 _ 


- -3 " 

1 


2 


3 

2 

5 

2 

5 

5 

1 _ 


1 

_ 1 


1 

1 _ 


Basis for Nul A: 


2 " 


" -7 " 

-2.5 


.5 

1 

5 

0 

0 


-4 

0 


1 



Construct a nonzero 3x3 matrix A , and construct b to be 
almost any convenient linear combination of the columns 
of A. 


29. Hint: You need a nonzero matrix whose columns are 
linearly dependent. 

31. If Col F ^ R 5 , then the columns of F do not span R 5 . 
Since F is square, the IMT shows that F is not invertible 
and the equation Fx = 0 has a nontrivial solution. That is, 
Nul F contains a nonzero vector. Another way to describe 
this is to write Nul F ^ {0}. 

33. If Col Q = IR 4 , then the columns of Q span R 4 . Since Q is 
square, the IMT shows that Q is invertible and the equation 
Qx — b has a solution for each b in R 4 . Also, each solution 
is unique, by Theorem 5 in Section 2.2. 

35. If the columns of B are linearly independent, then the 
equation Bx = 0 has only the trivial (zero) solution. That 
is, Nul B = {0}. 


37. [M] Display the reduced echelon form of A, and select the 
pivot columns of A as a basis for Col A . For Nul A , write 
the solution of Ax = 0 in parametric vector form. 


Basis for Col A : 


Basis for Nul A : 


1 

_1 


1 

L/l 

_1 

-7 


9 

-5 

5 

7 

1 

LO 

1 _ 


-7 


" -2.5 “ 


" 4.5 " 


" -3.5 " 

-1.5 


2.5 


-1.5 

1 

5 

0 


0 

0 


1 


0 

0 


0 


1 


Section 2.9, page 159 




9. Basis for Col A: 


Basis for Nul A: 




1 

to 

_ 1 


l 

1_ 


-1 


5 

5 

4 

5 

-3 


1 

to 

1_ 


7 


; dim Nul A = 1 


dim Col A = 3 


11. Basis for Col A: 



A = 3 Basis for Nul A: 


1 

<N 

1_ 


1 

0 

1 _ 

5 


4 

-9 

5 

-7 

1 

0 

1 _ 


11 


; dim Col 




dim Nul A = 2 


13. Columns 1,3, and 4 of the original matrix form a basis for 
H , so dim H = 3. 


15. Col A = R 3 , because A has a pivot in each row, and so the 
columns of A span R 3 . Nul A cannot equal R 2 , because 
Nul A is a subspace of R 5 . It is true, however, that Nul A is 
two-dimensional. Reason: The equation Ax = 0 has two 
free variables, because A has five columns and only three of 
them are pivot columns. 


17. See the Study Guide after you write your justifications. 

19. The fact that the solution space of Ax = 0 has a basis of 
three vectors means that dim Nul A = 3. Since a 5 x 7 
matrix A has seven columns, the Rank Theorem shows that 
rank A — 1 — dim Nul A = 4. See the Study Guide for a 
justification that does not explicitly mention the Rank 
Theorem. 


21. A 7 x 6 matrix has six columns. By the Rank Theorem, 
dim Nul A = 6 — rank A. Since the rank is four, 
dim Nul A = 2. That is, the dimension of the solution space 
of Ax = 0 is two. 


1. x = 3bi + 2b 2 = 3 






23. A3 x 4 matrix A with a two-dimensional column space has 
two pivot columns. The remaining two columns will 
correspond to free variables in the equation Ax = 0. So the 
desired construction is possible. There are six possible 
locations for the two pivot columns, one of which is 
■ **>!< 


0 


* 


* 


0 0 0 0 


. A simple construction is to take 


two vectors in R 3 that are obviously not linearly dependent 
and place them in a matrix along with a copy of each vector, 
in any order. The resulting matrix will obviously have a 
two-dimensional column space. There is no need to worry 
about whether Nul A has the correct dimension, since this is 
guaranteed by the Rank Theorem: dim Nul A = 4 — rank A. 
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25. The p columns of A span Col A by definition. If 

dim Col A = p, then the spanning set of p columns is 
automatically a basis for Col A, by the Basis Theorem. In 
particular, the columns are linearly independent. 

27. a. Hint: The columns of B span W, and each vector a 7 is 

in W. The vector c y is in MT because B has p columns. 

b. Hint: What is the size of Cl 

c. Hint: How are B and C related to A1 

29. [M] Your calculations should show that the matrix 

[ Vi v 2 x ] corresponds to a consistent system. The 
^-coordinate vector of x is (—5/3, 8/3). 


Chapter 2 Supplementary Exercises, page 162 


1. a. T 

b. F 

c. T 

d. F 

e. F 

f. F 

g- T 

h. T 

i. T 

j. F 

k. T 

1. F 

m. F 

n. T 

0 . F 

p. T 


3 . / 




A 2 = 2A — I . Multiply by A: A 3 = 2A 2 — A. 
Substitute A 2 = 2A — 7: A 3 = 2(2/1 — 7) — A = 
3/1-27. 


Multiply by A again: A 4 = A (3 A — 21) = 3A 2 — 2A. 


Substitute the identity A 2 = 2A — I again: 


= 3(2/1-7)-2/1 = 

10 -1 “ 

- 

9 10 

9. 

-5 -3 

- 


-3 

-8 


13 

27 


11. a. p(xi) = c 0 + ciXi H-b c n -ix? 1 


= row, (F) • 



= row, (F c) = y, 


_ Cn — 1 _ 

b. Suppose x\ ,..., x n are distinct, and suppose Fc = 0 for 
some vector c. Then the entries in c are the coefficients 
of a polynomial whose value is zero at the distinct 
points x \,..., x n . However, a nonzero polynomial of 
degree n — 1 cannot have n zeros, so the polynomial 
must be identically zero. That is, the entries in c must all 
be zero. This shows that the columns of F are linearly 
independent. 


c. Hint: When x \,..., x n are distinct, there is a vector c 
such that Fc = y. Why? 


13. a. P 2 = (uu r )(uu r ) = u(u r u)u r = u(l)u r = P 

b. P T = (uu T ) T = u TT u T = uu T = P 

c. Q 2 = (7 -27>)(7 -2P) 

= I - I(2P) - 2PI + 2P(2P) 

= I — AP + 4P 2 = 7, because of part (a). 


15. Left-multiplication by an elementary matrix produces an 
elementary row operation: 


C being changed into B by row operations using the 
inverses of the E t .) 

17. Since B is 4 x 6 (with more columns than rows), its six 
columns are linearly dependent and there is a nonzero x 
such that Bx = 0. Thus ABx = AO = 0, which shows that 
the matrix AB is not invertible, by the Invertible Matrix 
Theorem. 


19. [M] To four decimal places, as k increases, 




.2857 

.2857 

.2857 

.4286 

.4286 

.4286 

.2857 

.2857 

.2857 

.2022 

.2022 

.2022 

.3708 

.3708 

.3708 

.4270 

.4270 

.4270 


and 


or, in rational format, 



' 2/7 

2/7 

111 



A k -* 

3/7 

3/7 

3/1 


and 


_ 111 

2/7 

2/7 




' 18/89 

18/89 

18/89 “ 

B k -> 

33/89 

33/89 

33/89 


38/89 

38/89 

38/89 


Chapter 3 

Section 3.1, page 169 


1.1 3. 0 5. -24 7. 4 

9. 15. Start with row 3. 

11. —18. Start with column 1 or row 4. 

13. 6. Start with row 2 or column 2. 

15. 24 17. -10 

19. ad — be , cb — da. Interchanging two rows changes the 
sign of the determinant. 

21. ad — bc,akd — bkc = k(ad — be). Scaling a row by a 
constant k multiplies the determinant by k . 

23. la — 14 b A-lc. — la + \\b — 1c. Interchanging two rows 
changes the sign of the determinant. 

25. 1 27. 1 29. k 


31. 1. The matrix is upper or lower triangular, with only 1 ’s on 
the diagonal. The determinant is 1, the product of the 
diagonal entries. 



detiL4 = det 


a + kc 
c 


b + kd 
d 


= (a A kc)d — (b + kd)c 
= ad + ked — be — kdc = (+1 )(ad — be) 
= (det £)(det A) 


B ~ E\B ~ E 2 E\B ~ E 3 E 2 E 1 B — C 

So B is row equivalent to C . Since row operations are 
reversible, C is row equivalent to B . (Alternatively, show 



det7L4 = det 


c 

a 



= cb — ad = (—1 ){ad — be) 


= (det E) (det A) 




















Chapter 3 Supplementary Exercises A31 


37. 5A 


15 

20 



no 


39. Hints are in the Study Guide. 

41. The area of the parallelogram and the determinant of 

for any x, the area is still 

6. In each case the base of the parallelogram is unchanged, 
and the altitude remains 2 because the second coordinate of 
y is always 2. 

43. [M] In general, det A~ l = 1/det A as long as det A is 
nonzero. 


[u v ] both equal 6. If v = 


45. [M] You can check your conjectures when you get to 
Section 3.2. 


Section 3.2, page 177 


1. Interchanging two rows reverses the sign of the 
determinant. 

3. Multiplying a row by 3 multiplies the determinant by 3. 
5. -3 7. 0 9. -28 11. -48 

13. 6 15. 21 17. 7 19. 14 

21. Not invertible 23. Invertible 
25. Linearly independent 27. See the Study Guide. 

29. 16 


31. Hint: Show that (det A )(det A l ) = l. 
33. Hint: Use Theorem 6. 


35. Hint: Use Theorem 6 and another theorem. 


37. det AB = det 


6 0 
17 4 


= 24; (det A ){det B) = 3 • 8 = 24 


39. a. -12 


b. -375 


c. 4 


d. -h 


e. -27 


41. det A = (a + e)d — (b + f)c = ad + ed — be — fc 

— {ad — be) + {ed — fc) = det B + det C 

43. Hint: Compute det A by a cofactor expansion down 
column 3. 

45. [M] See the Study Guide after you have made a conjecture 
about A T A and AA T . 


13. adj A = 


1 -1 
1 -5 


1 


5 
1 

7 -5 


A~ l - - 
’ A ~ 6 


1 -1 
1 -5 


1 


5 
1 

7 -5 


15. adj A = 


17. If A = 


a 

c 


-1 

0 

0" 

,41-*- 1 

~-l 

0 

-1 

-5 

0 

-1 

-5 

-1 

-15 

5_ 

5 

-1 

-15 

b 

d 

, then 

C u 

= d , C 12 = - 

~c , C 21 

= -b. 


0 

0 

5 


C 22 = a. The adjugate matrix is the transpose of cofactors: 


adj A = 


d —b 


—c 


a 


Following Theorem 8, we divide by det A; this produces the 
formula from Section 2.2. 


19. 8 


21. 3 


23. 23 


25. A3 x 3 matrix A is not invertible if and only if its columns 
are linearly dependent (by the Invertible Matrix Theorem). 
This happens if and only if one of the columns is in the 
plane spanned by the other two columns, which is 
equivalent to the condition that the parallelepiped 
determined by these columns has zero volume, which in 
turn is equivalent to the condition that det A = 0. 

27. 12 29. ||det[vi v 2 ]| 

31. a. See Example 5. b. 4jtabc/3 

33. [M] In MATLAB, the entries in B — inv(A) are 

approximately 10 -15 or smaller. See the Study Guide for 
suggestions that may save you keystrokes as you work. 

35. [M] MATLAB Student Version 4.0 uses 57,771 flops for 
inv(A), and 14,269,045 flops for the inverse formula. The 
inv ( A ) command requires only about 0.4% of the 
operations for the inverse formula. The Study Guide shows 
how to use the flops command. 


Section 3.3, page 186 




1/4 

11/4 

3/8 


7. s / ±V3; X\ 


5s + 4 

6{7 2 - 3) ’ X2 


-4s - 15 
4(s 2 - 3) 


9. s 7 ^ 0,1; 
11 . adj A = 


7 4^ + 3 


X\ = 

3 {s 

— 1) ’ 2 6s{s 

0 

1 

0" 


-5 

-1 

-5 

, A-' = 

5 

2 

10 



1 

5 





Chapter 3 Supplementary Exercises, page 188 


1. a. T 

b. T 

c. F 

d. F 

e. F 

f. F 

g- T 

h. T 

i. F 

j. F 

k. T 

1. F 

m. F 

n. T 

0 . F 

p. T 


The solution for Exercise 3 is based on the fact that if a matrix 
contains two rows (or two columns) that are multiples of each 
other, then the determinant of the matrix is zero, by Theorem 4, 
because the matrix cannot be invertible. 
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3. Make two row replacement operations, and then factor out a 
common multiple in row 2 and a common multiple in row 3. 


11 


13 


15 


17 . 


1 

a 

b + c 


1 

a 

b + c 

1 

b 

a + c 

— 

0 

b — a 

a — b 

1 

c 

a + b 


0 

c — a 

a — c 


= (b — a) (c — a) 


1 

0 

0 


a 

1 

1 


b + c 

-1 

-1 


= 0 


5. -12 


7. When the determinant is expanded by cofactors of the first 
row, the equation has the form ax + by + c = 0, where at 
least one of a and b is not zero. This is the equation of a 
line. It is clear that (x 3 , jq) and (x 2 , y 2 ) are on the line, 
because when the coordinates of one of the points are 
substituted for x and y , two rows of the matrix are equal 

and so the determinant is zero. 

2 


9. T 




1 

0 

0 


a 

b — a 
c — a 


a 

b 2 — a 2 
c 2 — a 2 


. Thus, by Theorem 3, 


det T = (b — a)(c — a) det 


= (b — a)(c — a) det 


1 

0 

0 

1 

0 

0 


a 

1 

1 

a 

1 

0 


a 

b + a 
c + a 

a 2 

b + a 
c — b 


= (b — a)(c — a)(c — b ) 

Area = 12. If one vertex is subtracted from all four vertices, 
and if the new vertices are 0, Vi, v 2 , and v 3 , then the 
translated figure (and hence the original figure) will be a 
parallelogram if and only if one of Vi, v 2 , and v 3 is the sum 
of the other two vectors. 

By the Inverse Formula, (adj A) •- A = A~ l A = I . By 

det A 

the Invertible Matrix Theorem, adj A is invertible and 

^ Ar '=dL A - 

a. A = CA~ l ,7 = D — CA~ l B . Now use Exercise 
14(c). 

b. From part (a), and the multiplicative property of 
determinants, 


det 


A 

C 


B 

D 


= det [A{D -CA~ V B)} 

= det [AD - ACA~ l B] 
= det [AD — CAA~ l B\ 
= det [AD - CB] 


where the equality AC — CA was used in the third step 

First consider the case n = 2, and prove that the result 
holds by directly computing the determinants of B and C . 
Now assume that the formula holds for all 
(*-1) x (k — 1) matrices, and let A, B, and C be k x k 


matrices. Use a cofactor expansion along the first column 
and the inductive hypothesis to find det B . Use row 
replacement operations on C to create zeros below the first 
pivot and produce a triangular matrix. Find the determinant 
of this matrix and add to det B to get the result. 


19. [M] Compute: 


1 

1 

1 


1 

2 

2 


1 

2 

3 


= 1 , 


1 

1 

1 

1 


1 

2 

2 

2 


1 

2 

3 

3 


1 

2 

3 

4 


= 1 , 


1 
1 
1 
1 
1 

Conjecture: 


1 

2 

2 

2 

2 


1 

2 

3 

3 

3 


1 

2 

3 

4 
4 


1 

2 

3 

4 

5 


= 1 


1 

1 

1 


1 

2 

2 


1 

2 

3 


1 

2 

3 


1 


3 


n 


= 1 


To confirm the conjecture, use row replacement operations 
to create zeros below the first pivot, then the second pivot, 
and so on. The resulting matrix is 


1 

0 

0 


1 

1 

0 


1 

1 

1 


1 

1 

1 


0 0 0 


1 


which is an upper triangular matrix with determinant 1 


Chapter 4 

Section 4.1, page 197 


1 . a 


u + v is in V because its entries will both be 
nonnegative. 

^2 
2 


b. Example: If u = 


cu is not in V. 


3. Example: If u = 
not in H . 


and c — — 1, then u is in V, but 


5 

5 


and c — 4, then u is in H, but cu is 


5. Yes, by Theorem 1, because the set is Span {t 2 }. 

7. No, the set is not closed under multiplication by scalars that 
are not integers. 


9. H = Span {v}, where v = 


1 

3 

2 


. By Theorem 1, H is a 


subspace of 



































Section 4.2 A33 


11. W — Span {u, v}, where u = 


1 

_1 


1 

<N 

1_ 

1 

, V = 

0 

1 

O 

1_ 


1 

_1 


•By 


7. W is not a subspace of R 3 because the zero vector (0,0,0) 
is not in W . 


Theorem 1, W is a subspace of 



13. a. There are only three vectors in {v!, v 2 , v 3 }, and w is not 

one of them. 

b. There are infinitely many vectors in Span {vi, v 2 , v 3 }. 

c. w is in Span {vi, v 2 , v 3 }. 

15. Not a vector space because the zero vector is not in W 


9. IT is a subspace of R 4 because W is the set of solutions of 
the system 

a — 2b — Ac = 0 

2 a — c - 3d = 0 

11. W is not a subspace because 0 is not in W. Justification: If 
a typical element (b — 2d, 5 + d,b + 3d, d) were zero, 


17. S = < 



1 


-1 


0 

> 

then 5 + d = 0 and d 

— 

0, which 

< 

0 


1 

% 

-1 


> 

"1 

-6" 


-1 

/ 

0 

7 

1 


13. IT = Col A for A = 

0 

1 

< 

0 _ 


1 _ 


0 _ 

> 


1 

0 


, so IT is a vector space 


19. Hint: Use Theorem 1. 

Warning: Although the Study Guide has complete solutions for 
every odd-numbered exercise whose answer here is only a 
“Hint,” you must really try to work the solution yourself. 
Otherwise, you will not benefit from the exercise. 

21. Yes. The conditions for a subspace are obviously satisfied: 
The zero matrix is in H , the sum of two upper triangular 
matrices is upper triangular, and any scalar multiple of an 
upper triangular matrix is again upper triangular. 

23. See the Study Guide after you have written your answers. 


25. 4 


27. a. 8 


b. 3 


c. 5 


d. 4 


29. u + (— l)u = lu + (— l)u Axiom 10 

= [1 + (—l)]u Axiom 8 

= Ou = 0 Exercise 27 

From Exercise 26, it follows that (— l)u = —u. 

31. Any subspace H that contains u and v must also contain all 
scalar multiples of u and v and hence must contain all sums 
of scalar multiples of u and v. Thus H must contain 
Span {u, v}. 

33. Hint: For part of the solution, consider Wi and w 2 in 

H + K, and write wi and w 2 in the form Wi = Ui + Vi and 
w 2 = u 2 + v 2 , where Ui and u 2 are in H , and Vi and v 2 are 
in K. 

35. [M] The reduced echelon form of [ Vi v 2 v 3 w ] shows 
that w = Vi — 2v 2 + v 3 . 

37. [M] The functions are cos 4 1 and cos 6 1. See Exercise 34 in 
Section 4.5. 


Section 4.2, page 207 


1. 


3 

6 

8 


5 

2 

4 


-3 

0 

1 


1 

1 

1 _ 


1 

0 

1 _ 


3 

— 

0 

1 

1 

_ 1 


1 

0 
_ 1 


, so w is in Nul A 


3. 


7 

4 

1 

0 


-6 

2 

0 

1 


5. 


2 

1 

0 

0 

0 


-4 

0 

9 

1 

0 


15. 


21 . 


by Theorem 3. 


0 

1 

4 

3 


17. a. 2 


2 3 

1 -2 
1 0 
-1 -1 

b. 4 


19. a. 5 


b. 2 


3 

1 


in Nul A, 


2 

1 

4 

3 


in Col A . Other answers possible 


23. w is in both Nul A and Col A. 

25. See the Study Guide. By now you should know how to use 
it properly. 


27. Let x = 

3" 

2 

and A = 

1 

-2 

-3 

4 

-3" 

2 


-1 


-1 

5 

7 


. Then x is in 


Nul A. Since Nul A is a subspace of R 3 , lOx is in Nul A. 

29. a. AO = 0, so the zero vector is in Col A. 

b. By a property of matrix multiplication, 

Ax + Aw = A(x + w), which shows that Ax + Aw is a 
linear combination of the columns of A and hence is in 
Col A. 

c. c(Ax) = A(cx), which shows that c(Ax) is in Col A for 
all scalars c . 

31. a. For arbitrary polynomials p, q in P 2 and any scalar c, 


n P + q) = 


(P + q)(0) 
(P + q)(l) 


P(0) + q(0) 

p(i) + q(i) 


T(cp) = 


p(o) 

p(l) 

cp( 0 ) 

cp(l) 


+ 


q(0) 

q(l) 


= T(p) + T( q) 


= c 


P(0) 

P(i) 


= cT(p) 


So T is a linear transformation from P 2 into P 2 . 

b. Any quadratic polynomial that vanishes at 0 and 1 must 
be a multiple of p (7) = t(t — 1). The range of T is 70)2 
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33. a. For A, B in M 2x2 and any scalar c, 

T(A + B) = (A + B ) + (4 + 5) r 

= 4 + 2?-|-4 r + 2? r Transpose property 

= 04 + 4 r ) + (5 + 5 r ) = T(A ) + T(£) 
T(c4) = (c4) + (c4) r = cA + c4 r 

= c(4 + 4 r ) = c7\4) 


So T is a linear transformation from M 2x 2 into M 2x 2 - 

b. If 5 is any element in M 2x2 with the property that 
B T = B, and if ^4 = ^B, then 

B) r = \B + \_B = B 



c. Part (b) showed that the range of T contains all B such 
that B T = B. So it suffices to show that any B in the 
range of T has this property. If B — T (4) , then by 
properties of transposes, 

5 r = (/l + /4 r ) = 4 r + 4 rr = /l r + /l = 5 


d. The kernel of T is 

35. Hint: Check the three conditions for a subspace. Typical 
elements of T(U) have the form T (ui) and T (u 2 ), where 
Ui and u 2 are in U. 



37. [M] w is in Col A but not in Nul A . (Explain why.) 


39. [M] The reduced echelon form of A is 


1 0 
0 1 
0 0 
0 0 


1/3 

1/3 

0 

0 


0 10/3 

0 -26/3 
1 -4 

0 0 


Section 4.3, page 215 


1. Yes, the 3x3 matrix A = 


1 1 


0 

0 


1 

0 


1 

1 

1 


has 3 pivot 


positions. By the Invertible Matrix Theorem, A is invertible 
and its columns form a basis for R 3 . (See Example 3.) 



3. No, the vectors are linearly dependent and do not span 

5. No, the set is linearly dependent because the zero vector is 
in the set. However, 


1 

-2 

0 

1 

0 


1 

-2 

0 

1 

0 

-3 

9 

0 

-3 


0 

3 

0 

-3 

1 

0 

0 

0 

Lh 

1_ 


1 

0 

0 

0 

1 

kn 


The matrix has pivots in each row and hence its columns 
span R 3 . 

7. No, the vectors are linearly independent because they are 
not multiples. (More precisely, neither vector is a multiple 
of the other.) However, the vectors do not span R 3 . The 


matrix 


-2 


6 


can have at most two pivots since it has 


3 -1 
0 5 . 

only two columns. So there will not be a pivot in each row. 



13. Basis for Nul A: 


-6 


-5 

-5/2 


-3/2 

1 

7 

0 

0 


1 


Basis for Col A: 

-2 

2 

5 

1 

1_ 


1 

LO 

1_ 


1 

OO 

_1 


15. {vi,v 2 ,v 4 } 17. [M] {vi,v 2 ,v 3 } 

19. The three simplest answers are {vi, v 2 } or {vi, v 3 } or 
{v 2 , v 3 }. Other answers are possible. 


21. See the Study Guide for hints. 

23. Hint: Use the Invertible Matrix Theorem. 

25. No. (Why is the set not a basis for HI) 

27. {cos&>C sin&>?} 

29. Let A be the n x k matrix [ vi • • • ]. Since A has 

fewer columns than rows, there cannot be a pivot position 
in each row of A. By Theorem 4 in Section 1.4, the columns 
of A do not span R n and hence are not a basis for R' 1 . 

31. Hint: If {vi,..., v^} is linearly dependent, then there exist 
Ci, ..., c p , not all zero, such that C\\\ + • • • + c p \ p = 0. 
Use this equation. 


33. Neither polynomial is a multiple of the other polynomial, so 
{pi,p 2 } is a linearly independent set in P 3 . 

35. Let {vi, v 3 } be any linearly independent set in the vector 
space V, and let v 2 and v 4 be linear combinations of Vi and 
v 3 . Then {vi, v 3 } is a basis for Span{vi, v 2 , v 3 , v 4 }. 


37. [M] You could be clever and find special values of t that 
produce several zeros in (5), and thereby create a system of 
equations that can be solved easily by hand. Or, you could 
use values of t such as t = 0, .1, .2,... to create a system of 
equations that you can solve with a matrix program. 


Section 4.4, page 224 




15. The Study Guide has hints. 
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17. 


21 . 


23. 


25. 


1 

1 


= 5vi — 2v 2 = 10 vi — 3v 2 + v 3 (infinitely many 


answers) 


19. Hint: By hypothesis, the zero vector has a unique 

representation as a linear combination of elements of S 


9 

4 


2 

1 


Hint: Suppose that [u] B = [w] B for some u and w in V, and 
denote the entries in [u] B by c \,..., c n . Use the definition 

of [u] B . 

One possible approach: First, show that if Ui ,..., are 
linearly dependent , then [ui] B , ..., [u p \s are linearly 
dependent. Second, show that if [ui] B , ..., [\x p \b are 
linearly dependent, then Ui, ..., are linearly dependent. 
Use the two equations displayed in the exercise. A slightly 
different proof is given in the Study Guide. 


27. Linearly independent. (Justify answers to Exercises 27-34.) 
29. Linearly dependent 



1 


-3 


-4 


1 

31. a. The coordinate vectors 

-3 

5 

5 

5 

5 

5 

0 


5 


-7 


-6 


-1 


do not span E 3 . Because of the isomorphism between 
E 3 and P 2 , the corresponding polynomials do not span 

P 2 . 



0 


1 


-3 


2 

b. The coordinate vectors 

5 


-8 

5 

4 


-3 


1 


-2 


2 


0 


span E 3 . Because of the isomorphism between 
P 2 , the corresponding polynomials span P 2 . 



and 


33. [M] The coordinate vectors 


1 

LO 

_1 


5 


1 

O 

1_ 


1 

7 


1 


1 


16 

0 

5 

0 

5 

-2 

5 

-6 

1 

0 

1 _ 


1 

CN 

_1 


1 

0 

_ 1 


1 

CN 

_1 


are a linearly dependent subset of P 4 . Because of the 
isomorphism between E 4 and P 3 , the corresponding 
polynomials form a linearly dependent subset of P 3 , and 
thus cannot be a basis for P 3 . 


35. [M] [x] B = 


5/3 

8/3 


37. [M] 


1.3 

0 

0.8 


Section 4.5, page 231 


1. 


3. 


1 


1 

CN 

1_ 

1 

5 

1 

1 

O 

1_ 


1 

1_ 

"0" 


1 

O 

1 


-1 

0 

5 

1 

1 


1 

CN 

_1 


; dim is 2 


2 

0 

3 

0 


; dim is 3 


5. 


r 


1 

1_ 

2 


5 

-1 

5 

0 

1 

OJ 

1_ 


7 


; dim is 2 


7. No basis; dim is 0 


9. 2 


11 . 2 


13. 2,3 


15. 2,2 


17. 0,3 


19. See the Study Guide. 


21. Hint: You need only show that the first four Hermite 
polynomials are linearly independent. Why? 

23. [p] s = (3,3, —2. |) 

25. Hint: Suppose S does span V, and use the Spanning Set 
Theorem. This leads to a contradiction, which shows that 
the spanning hypothesis is false. 

27. Hint: Use the fact that each P„ is a subspace of P . 


29. Justify each answer. 


a. True 


b. True 


c. True 


31. Hint: Since H is a nonzero subspace of a finite-dimensional 
space, H is finite-dimensional and has a basis, say, 

Vi, ..., \ p . First show that {r(vi),..., T^v^)} spans T(H). 


33. 


[M] a. One basis is {vj, v 2 , v 3 , e 2 , e 3 }. In fact, any two of 
the vectors e 2 ,..., e 5 will extend {vi, v 2 , v 3 } to a basis of 



Section 4.6, page 238 


1 


rank A = 2; dimNul A = 2; 

1 

Basis for Col A: —1 

5 

Basis for Row A: (1,0, —1,5), (0, —2, 5, —6) 

1 



1 

1_ 

5 

2 


1 

VO 

_1 


Basis for Nul A: 


5/2 

1 

0 



1 

Ul 
_1 


-3 

5 

0 


1 


3 


rank A = 3; dim Nul A = 2; 

2 


Basis for Col A: 


-2 

4 

-2 



1 

VO 

1 _ 


1 

CN 

1 _ 


-3 


-3 

5 

9 

n 

5 


1 

LO 

1 _ 


1 

'xj- 

_1 


Row A: (2, -3, 6,2, 5), (0,0, 3, -1,1), (0,0,0,1,3) 

'3/2 

1 
0 
0 


Basis for Nul A: 


0 



9/2“ 


0 

5 

-4/3 


-3 


1 


5. 5,3,3 


7 


Yes; no. Since Col A is a four-dimensional subspace of E 4 , 
it coincides with E 4 . The null space cannot be E 3 , because 
the vectors in Nul A have 7 entries. Nul A is a 
three-dimensional subspace of E 7 , by the Rank Theorem. 


9. 2 


11. 3 


13. 5, 5. In both cases, the number of pivots cannot exceed the 
number of columns or the number of rows. 
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15. 

19. 

21 . 

23. 

25. 

27. 

29. 






2 17. See the Study Guide. 

Yes. Try to write an explanation before you consult the 
Study Guide. 

No. Explain why. 


Yes. Only six homogeneous linear equations are necessary. 


No. Explain why. 


Row A and Nul A are in W l ; Col A and Nul A T 
There are only four distinct subspaces because 
Row A T = Col A and Col A T = Row A. 


are in M. 


m 


Recall that dim Col A = m precisely when Col A = IR m , or 
equivalently, when the equation Ax = b is consistent for all 
b. By Exercise 28(b), dim Col A = m precisely when 
dim Nul A T = 0, or equivalently, when the equation 
A T x = 0 has only the trivial solution. 



2 a 2b 

2c 

T 

uv y = 

—3a —3b 

—3c 


5a 5b 

5c 

multiples of u, so Coluv r 

is one 


. The columns are all 



Hint: 

Let A = 

[u u 2 u 3 ]. 

If u 

/ 0, then u is a basis for 

Col A 

.Why? 







[M] a 

. Many 

answers are 

possible. Here are the 


“canonical” choices, for A 

— 

[ a i 

a 2 

• a 7 ]: 







- —13/2 

-5 

3' 






—11/2 

-1/2 

-2 






1 

0 

0 

c = \ 

ai a 2 

a 4 a 6 L 

N 

— 

0 

11/2 

-7 






0 

1 

0 






0 

0 

-1 






0 

0 

1 


"1 0 

13/2 

0 


5 

0 -3" 


R = 

0 1 

0 0 

11/2 

0 

0 

1 


1/2 

-11/2 

0 2 

0 7 



_° 0 

0 

0 


0 

1 1 


b. M 

= [2 

41 0 -28 

11 

] T . The matrix [ R T 

N] 


is 7 x 7 because the columns of R T and N are in IR 7 , 
and dim Row A + dim Nul A = 7. The matrix [C M ] 
is 5 x 5 because the columns of C and M are in IR 5 and 
dim Col A + dim Nul A T = 5, by Exercise 28(b). The 
invertibility of these matrices follows from the fact that 
their columns are linearly independent, which can be 
proved from Theorem 3 in Section 6.1. 

[M] The C and R given for Exercise 35 work here, and 
A = CR. 


7. 

9. 

11 . 


P _ 

~-3 

r 

P _ 

~-2 

1 

C<—B ~ 

_—5 

2 

» B<-C ~ 

_—5 

3 

P _ 

9 

-2" 

P _ 

"1 

2" 


-4 

1 

’ B^C ~ 

4 

9 


See the Study Guide. 


13 P - 

C^B ~ 

1 

-2 

3 

-5 

0 " 

2 

» [ — 1 + 2t]i3 — 

5" 

-2 


1 

4 

3 


1 


15. a. B is a basis for V. 

b. The coordinate mapping is a linear transformation. 

c. The product of a matrix and a vector 

d. The coordinate vector of v relative to B 


17. a. [M] 



32 

0 

16 

0 

12 

0 

10 



32 

0 

24 

0 

20 

0 

p -i _ 1 



16 

0 

16 

0 

15 




8 

0 

10 

0 

32 





4 

0 

6 







2 

0 








1 


b. P is the change-of-coordinates matrix from C to B. So 
P~ l is the change-of-coordinates matrix from B to C, 
by equation (5), and the columns of this matrix are the 
C-coordinate vectors of the basis vectors in B, by 
Theorem 15. 


19. [M] Hint: Let C be the basis {vi, v 2 , v 3 }. Then the columns 
of P are [ui] c , [u 2 ] c , and [u 3 ] c . Use the definition of 
C-coordinate vectors and matrix algebra to compute Ui, u 2 
and u 3 . The solution method is discussed in the Study 
Guide. Here are the numerical answers: 



~-6“ 


1 

VO 

1_ 


~-5“ 

a. Ui = 

-5 

21 

, u 2 = 

-9 

_ 32 _ 

, u 3 = 

0 

3 



" 28“ 


38“ 


21 

b. Wi = 

-9 

-3 

, w 2 = 

-13 

2 

, w 3 = 

1 

1_ 


Section 4.8, page 253 

1. If y k = 2*,then y k + 1 = 2 k+i and y k+2 = 2^+ 2 . 

Substituting these formulas into the left side of the equation 
gives 

y k+2 + 2y k+l -Sy k = 2 k + 2 + 2 ■ 2 k +' - 8 • 2 k 

= 2 ^( 2 2 + 2-2 — 8 ) 

= 2 k (0) = 0 for all k 


Section 4.7, page 244 



Since the difference equation holds for all k, 2 k is a 
solution. A similar calculation works for y k = {—A) k . 

3. The signals 2 k and (— A) k are linearly independent because 
neither is a multiple of the other. For instance, there is no 
scalar c such that 2 k = c(—A) k for all k. By Theorem 17, 
the solution set H of the difference equation in Exercise 1 
is two-dimensional. By the Basis Theorem in Section 4.5, 
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the two linearly independent signals 2 k and (—4)^ form a 
basis for H . 

5. If yk = (—3) A , then 

y k+2 + 6 y k+1 + 9y k = (~++ 2 + 6(-3) t+1 + 9(-3) k 

= (-3)*[(-3) 2 + 6(—3) + 9] 

= (—3) A (0) = 0 for all k 

Similarly, if yk = k(—3) k , then 


29. Xfc+i = Axk, where 



"0 

1 

0 

0" 


yk 

A = 

0 

0 

1 

0 

, x = 

yk+i 

0 

0 

0 

1 

7 

yk+2 


9 

-6 

-8 

6 


_ yk+2 _ 


31. The equation holds for all k, so it holds with k replaced by 
k — 1, which transforms the equation into 


yk+2 + 6yic+i + 9yk 

= (k + 2)(—3)^+ 2 + 6(k + l)(-3)* +1 + 9A:(—3)^ 

= ( — 3 ) k \(k + 2)(— 3) 2 + 6 (k + 1) (—3) + 9k] 

= (—3 ) k [9k + 18 - 18A: - 18 + 9k] 

= (—3)^(0) for all k 

Thus both (—3) k and k(—3) k are in the solution space H of 
the difference equation. Also, there is no scalar c such that 
k(—3) k = c(—3) k for all k, because c must be chosen 
independently of k. Likewise, there is no scalar c such that 
(— 3) k = ck(—3) k for all k. So the two signals are linearly 
independent. Since dim H — 2, the signals form a basis for 
H , by the Basis Theorem. 

7. Yes 9. Yes 

11. No, two signals cannot span the three-dimensional solution 
space. 

13. (if, (|f 15. 5 k ,(—5) k 

17. Yk = Ci(. 8) a ' + c 2 (.5) k -t- 10 —> 10 as k -> oo 

19. y k = C\{—2 + V3) k + c 2 (-2 - V3) k 

21. 7, 5,4, 3,4, 5, 6, 6, 7, 8,9, 8, 7; see figure below. 



23. a. y k +i - 1.01 y k = -450, y 0 = 10,000 

b. [M] MATLAB code: 

pay = 450, y = 10000, m = 0 
table = [0 ; y] 
while y > 450 

y = 1.01*y - pay 
m = m + 1 

table = [table [m ; y] ] 

%append new column 

end 
m, y 


yk +2 + 5y k +1 + 6y k = 0 for all k 
The equation is of order 2. 

33. For all k, the Casorati matrix C{k) is not invertible. In this 
case, the Casorati matrix gives no information about the 
linear independence/dependence of the set of signals. In 
fact, neither signal is a multiple of the other, so they are 
linearly independent. 

35. Hint: Verify the two properties that define a linear 
transformation. For {y k } and {zk} in S, study 
T({yk} + {Zk}) - Note that if r is any scalar, then the kth 
term of r{yk} is ry+, so T(r{y k }) is the sequence {wk} 
given by 

W k = ry k+ 2 + a(ry k+ i) + b(ry k ) 

37. {TD)(y 0 ,y u y 2 ,...) = T(D(y 0 , y u y 2 ,...)) = 

T(0, y 0 , y\,y 2 ,...) = (yo,yi,y 2 , ■■•) = H.yo,y\,y 2 , ■..), 

while (DT)(y 0 ,y u y 2 ,...) = D(T(y 0 , y u y 2 ,...)) = 

D(yi,y 2 ,y+ ■■■) = (0,yi,y2,y3, • • •)• 


Section 4.9, page 262 


1. a. 


3. a. 


From: 

N M 

.7 .6 

.3 .4 

From: 

H I 

.95 .45 

.05 .55 


c. .925;usex 0 = 


To: 

News 

Music 


To: 

Healthy 

Ill 

1 
0 


5. 

1 1 

^ SO 
• • 

i_i 

7. 

"1/4“ 

1/2 



1/4 


b. 


1 

0 


c. 33% 


b. 15%, 12.5% 


9. Yes, because P 2 has all positive entries. 


11. a. 


13. a. 


2/3 

1/3 

.9 
.1 


b. 2/3 


b. .10, no 


c. [M] At month 26, the last payment is $114.88. The total 
paid by the borrower is $11,364.88. 

25. k 2 + ci • (—4)* + c 2 27. 2 - 2k + c, • A k + c 2 • 2~ k 


15. [M] About 17.3% of the United States population 

17. a. The entries in a column of P sum to 1. A column in the 

matrix P — I has the same entries as in P except that 
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one of the entries is decreased by 1. Hence each column 
sum is 0. 

b. By (a), the bottom row of P — I is the negative of the 
sum of the other rows. 

c. By (b) and the Spanning Set Theorem, the bottom row 
of P — I can be removed and the remaining in — 1) 
rows will still span the row space. Alternatively, use (a) 
and the fact that row operations do not change the row 
space. Let A be the matrix obtained from P — I by 
adding to the bottom row all the other rows. By (a), the 
row space is spanned by the first (n — 1) rows of A. 


d 


19. a 


b 


By the Rank Theorem and (c), the dimension of the 
column space of P — I is less than n , and hence the null 
space is nontrivial. Instead of the Rank Theorem, you 
may use the Invertible Matrix Theorem, since P — I is 
a square matrix. 

The product Sx equals the sum of the entries in x. For a 
probability vector, this sum must be 1. 

^ = [pi p 2 ••• pj, where the p, are probability 
vectors. By matrix multiplication and part (a), 


SP = [ S Pi *Sp 


Sp„] = [l 1 


11 = 5 


c. 


By part (b), S(Px) = (SP)x = Sx = 1. Also, the 
entries in Px are nonnegative (because P and x have 
nonnegative entries). Hence, by (a), Px is a probability 
vector. 


21. [M] 


a. To four decimal places, 


P 4 = P 5 = 


.2816 

.2816 

.2816 

.2816 

.3355 

.3355 

.3355 

.3355 

.1819 

.1819 

.1819 

.1819 

.2009 

.2009 

.2009 

.2009 


q = 


.2816 

.3355 

.1819 

.2009 


Note that, due to round-off, the column sums are not 1 
b. To four decimal places, 


G 80 = 


116 _ 


q = 


.7353 

.0882 

.1765 


c. Let P be an n x n regular stochastic matrix, q the 
steady-state vector of P , and ei the first column of the 
identity matrix. Then P k e\ is the first column of P k . By 


Theorem 18 , P A ei -> q as k -> oo. Replacing ei by the 
other columns of the identity matrix, we conclude that 
each column of P k converges to q as k -> oo. Thus 
F A ^[q q ••• ql. 


Chapter 4 Supplementary Exercises, page 264 


1. a. T 

b. T 

c. F 

d. F 

e. T 

f. T 

§• F 

h. F 

i. T 

j- F 

k. F 

1. F 

m. T 

n. F 

o. T 

P- T 

q. F 

r. T 

s. T 

t. F 






3. The set of all (b\,b 2 , b 3 ) satisfying b\ + 2 b 2 + b 3 = 0. 

5. The vector p 1 is not zero and p 2 is not a multiple of pj , so 
keep both of these vectors. Since p 3 = 2pj + 2p 2 , discard 
p 3 . Since p 4 has a t 2 term, it cannot be a linear combination 
of pj and p 2 , so keep p 4 . Finally, p 5 = p 1 + p 4 , so discard 
p 5 . The resulting basis is {Pi, p 2 , p 4 }. 

7. You would have to know that the solution set of the 
homogeneous system is spanned by two solutions. In this 
case, the null space of the 18 x 20 coefficient matrix A is at 
most two-dimensional. By the Rank Theorem, 
dim Col A >20 — 2=18, which means that Col A = M. 18 , 
because A has 18 rows, and every equation Ax = b is 
consistent. 

9. Let A be the standard m x n matrix of the transformation T. 


11 . 


".7354 

.7348 

.7351 


12. 

.0881 

.0887 

.0884 

9 


_.1764 

.1766 

• 

05 

1_ 


13. 


".7353 

.7353 

.7353" 


G 117 = 

.0882 

.0882 

.0882 



.1765 

.1765 

.1765 

15. 


a 


If T is one-to-one, then the columns of A are linearly 
independent (Theorem 12 in Section 1.9), so 
dimNul A = 0. By the Rank Theorem, 
dim Col A = rank A = n. Since the range of T is Col A , 
the dimension of the range of T is n. 


b 


If T is onto, then the columns of A span R m (Theorem 
12 in Section 1.9), so dim Col A = m. By the Rank 
Theorem, dimNul A = n — dim Col A = n — m . Since 
the kernel of T is Nul A , the dimension of the kernel of 
T is n — m. 


If S is a finite spanning set for V , then a subset of S — say 
S' —is a basis for V. Since S' must span V, S' cannot be a 
proper subset of S because of the minimality of S . Thus 
S' = S , which proves that S is a basis for V. 

12. a. Hint: Any y in Col AB has the form y = ABx for some 

x. 


rank A = rank P 1 PA < rank PA . Thus 
rank PA = rank A . 

The equation AB = 0 shows that each column of B is in 
Nul A. Since Nul A is a subspace, all linear combinations of 
the columns of B are in Nul A, so Col B is a subspace of 
Nul A. By Theorem 11 in Section 4.5, 
dim Col B < dimNul A. Applying the Rank Theorem, we 
find that 

n = rank A + dim Nul A > rank A + rank B 















Section 5.2 A39 


17. a. Let A\ consist of the r pivot columns in A. The columns 

of A\ are linearly independent. So A\ is an m x r with 
rank r. 

b. By the Rank Theorem applied to A \, the dimension of 
Row A is r , so A\ has r linearly independent rows. Use 
them to form A 2 . Then A 2 is r x r with linearly 
independent rows. By the Invertible Matrix Theorem, 

A 2 is invertible. 


19. [ B AB A 2 B ] = 



"0 

1 

0" 

— 

1 

-.9 

.81 


_1 

.5 

.25 _ 


"1 

-.9 

.81" 


0 

1 

0 


_0 

0 

i 

in 

• 


This matrix has rank 3, so the pair (A, B) is controllable. 

21. [M] rank [B AB A 2 B A 2 B ] = 3. The pair (A, B) is 
not controllable. 


Chapter 5 

Section 5.1, page 273 


1. Yes 


3. No 


5. Yes, A = 0 


9. A = 1: 


0 

1 


A = 5: 


2 

1 


11 


-1 

3 


13. A = 1: 

" 0 " 

1 

; A = 2: 

"-1" 

2 

; A = 3: 

"-I" 

1 


0 


2 


1 



i 

CM 

1_ 


1 

_1 

15. 

1 

5 

0 


1 

O 

_1 


1 


17. 0,2, —1 


27 


19. 0. Justify your answer. 

21. See the Study Guide , after you have written your answers. 

23. Hint: Use Theorem 2. 

25. Hint: Use the equation Ax = Ax to find an equation 
involving A ~ l . 

Hint: For any A, (A — A I) T = A T — XI . By a theorem 
(which one?), A T —XI is invertible if and only if A — XI is 
invertible. 

29. Let v be the vector in R' 7 whose entries are all l’s. Then 
A\ = s\. 

31. Hint: If A is the standard matrix of T, look for a nonzero 
vector v (a point in the plane) such that Ay = v. 

33. a. Xfc+i = c\X k+l xx + c 2 ti k+l y 
b. Ax k = ^(ciA^u + c 2 /i k y) 


c\X k Au + c 2 fi k Ay 
CiA a 'Au + c 2 ii k i±y 


Linearity 

u and v are eigenvectors. 



37. [M] A = 3: 


1 

_1 


i 

<N 

1_ 


"-1" 

-2 

; A = 13: 

1 

5 

0 

i 

1_ 


1 

O 

_1 


1 


. You can 


speed up your calculations with the program nulbasis 
discussed in the Study Guide. 


39. [M] A = -2: 


2 

7 

5 

5 


3 

7 

5 

0 





0_ 



5_ 


2" 


"-1 



"2" 


-1 


1 


0 


1 

A = 5: 

1 


0 

5 

0 

Yes, 

1 


0 


1 


0 


-1 


0 


0 


1 


Section 5.2, page 281 


1. A 2 — 4A — 45; 9, —5 


3. A 2 -2A - 1; 1 ± V2 


5. A 2 — 6A + 9; 3 


7. A 2 — 9A + 32; no real eigenvalues 


9. -A 3 + 4A 2 - 9A - 6 


11. -A 3 + 9A 2 - 26A + 24 


13. -A 3 + 18A 2 - 95A + 150 15. 4, 3, 3,1 

17. 3,3,1,1,0 

19. Hint: The equation given holds for all A. 

21. The Study Guide has hints. 


23. Hint: Find an invertible matrix P so that RQ = P~ l AP 


25. a. {vi, v 2 }, where v 2 = 


1 

1 


is an eigenvector for A = .3 


b. x 0 = Vi - h 


c. Xi = y x 


14 V 2 

77 (*3)v 2 , x 2 = Vi - 77(.3) 2 v 2 , and 


x* 

X* 


= Vi - J^(.3) k y 2 . As k -> 00 , (.3) 

-> Vi. 


k 


0 and 


27. a. A\i = Vi, Ay 2 = .5v 2 , Av 3 = .2v 3 . (This also shows 

that the eigenvalues of A are 1, .5, and .2.) 

b. {vi , v 2 , v 3 } is linearly independent because the eigen¬ 
vectors correspond to distinct eigenvalues (Theorem 2) 
Since there are 3 vectors in the set, the set is a basis for 
R 3 . So there exist (unique) constants such that 


= Xfc+i 


Xo = ciVi + C 2 y 2 + c 3 v 3 













































A40 Answers to Odd-Numbered Exercises 


Then 


31. Hint: Construct a suitable 2x2 triangular matrix 


W r x 0 = CiW r Vi + C 2 W 7 V 2 + c 3 w^V 3 

Since x 0 and Vi are probability vectors and since the 
entries in v 2 and in v 3 each sum to 0, (*) shows that 

1 = C\. 

c. By (b), 

X 0 = Vi + C2V2 + C 3 V 3 
Using (a), 

Xk = A k x 0 = A k \\ + c 2 A k Y 2 + c 3 A k Y 3 
= vi + c 2 (.5) k Y 2 + c 3 (. 2) k Y 3 
-> Vi as k -> 00 

29. [M] Report your results and conclusions. You can avoid 
tedious calculations if you use the program gauss 
discussed in the Study Guide. 


(*) 


Section 5.3, page 288 


1. 


"226 

-525" 

3. 

a k 

0 ' 

90 

-209 _ 

3 (a k — b k ) 

b k 


5. A = 5: 


~r 


r 


1 

<N 

1_ 

1 

; A = 1: 

0 


-1 

1 


-1 


1 

O 

_1 


When an answer involves a diagonalization, A = PDP~ l , the 
factors P and D are not unique, so your answer may differ from 
that given here. 


7. P = 


1 

3 


0 

1 


,D = 


1 

0 


0 

-1 


9. Not diagonalizable 



"1 

2 

r 


"3 

0 

0" 

11. P = 

3 

3 

1 

,D = 

0 

2 

0 


4 

3 

1 


0 

0 

1 




"-1 

2 

r 


"5 

0 

0" 

13. 

P = 

-1 

-1 

0 

,0 = 

0 

1 

0 



1 

0 

1_ 


_0 

0 

1_ 



"-1 

-4 

-2" 


"3 

0 

0" 

15. 

P = 

1 

0 

-1 

,£> = 

0 

3 

0 



0 

1 

1 


0 

0 

1 


17. Not diagonalizable 


19. P = 


"1 

3 

-1 

-1" 


"5 

0 

0 

0" 

0 

2 

-1 

2 

,D = 

0 

3 

0 

0 

0 

0 

1 

0 

0 

0 

2 

0 

0 

0 

0 

1 


0 

0 

0 

2 


21. See the Study Guide. 


23. Yes. (Explain why.) 


25. No, A must be diagonalizable. (Explain why.) 

27. Hint: Write A = PDP~ l . Since A is invertible, 0 is not an 
eigenvalue of A, so D has nonzero entries on its diagonal. 


33. [M] P = 


D = 


5 

0 

0 

0 


35. [M] P = 


D = 


5 

0 

0 

0 

0 


2 

1 

-1 

2 

0 

1 

0 

0 

6 

-1 

-3 

3 

0 

0 

5 

0 

0 

0 


2 

-1 

-7 

2 

0 

0 

-2 

0 

3 

-1 

-3 

0 

3 

0 

0 

3 

0 

0 


1 

1 

1 

0 

0" 

0 

0 

— 2 _ 
2 

-1 

-4 

-1 

4 

0 

0 

0 

1 

0 


6 

-3 

0 

4 


4 
-3 
-2 

5 
0 

0 “ 

0 

0 

0 

1 


3 

-1 

-4 

0 

5 


Section 54, page 295 


1. 


3 

-5 


-1 

6 


0 

4 


3. a. T(ei) = —b 2 + b 3 , T(e 2 ) = 
T(e 3 ) = bi -b 2 


—bi — b 3 , 


b. [T(eO] B = 


[ T ( e 3 )\)3 — 


0 

1 

1 

1 

1 

0 


> [T{^i)]b — 


-1 

0 

-1 


c. 


0 -1 

1 0 

1 -1 


1 

-1 

0 


5. a. 10 — 3t + 4 1 2 + t 3 

b. For any p, q in P 2 and any scalar c, 

Hp( 0 + q(0] = it + 5)[p(0 + q(0] 

= (A + 5)p(0 + (t + 5)q(0 

= Hp( 0] + Hq(0] 

Y[c-p(0] = (t + 5)[c-p(0] = £•(* + 5)p(0 


= c-T\p(t)] 


c. 


5 

1 

0 

0 


0 0 
5 0 

1 5 

0 1 


29. One answer is P\ = 

1 1" 

, whose columns are 


"3 

0 

0" 

-2 -1 

7. 

5 

-2 

0 

eigenvectors corresponding to the eigenvalues in D 1 . 


0 

4 

1 
























































Section 5.6 A41 


9. a. 


11 . 


19 


2 

5 

8 


In Exercises 13-20, other answers are possible. Any P that 
makes P~ l AP equal to the given C or to C T is a satisfactory 
answer. First find P ; then compute P~ l AP. 


b. Hint: Compute T( p + q) and T(c • p) for arbitrary p, q 
in P 2 and an arbitrary scalar c . 


13. P = 


1 -1 

1 0 


, C = 


2 

1 


-1 

2 


c. 


1 

0 


1 -1 
1 0 

1 1 

5' 

1 


1 

0 

1 


15. P = 


1 

2 


3 

0 


, c = 


13. bi = 


1 

1 


hi = 


1 

3 


17. P = 


2 -1 
5 0 


,C = 


2 -3 

3 2 

-.6 -.8 
.8 -.6 


15. bi = 


~-2~ 

1 

, b 2 = 

V 

1 

19. P = 

"2 -1" 
2 0 

. c = 

-.96 -.28" 
.28 .96 


17. a 


b 


Ahi = 2bi, so bi is an eigenvector of A. However, A 
has only one eigenvalue, A = 2, and the eigenspace is 
only one-dimensional, so A is not diagonalizable. 

2 -1 
0 2 


By definition, if A is similar to B , there exists an invertible 
matrix P such that P~ l AP = B . (See Section 5.2.) Then 
B is invertible because it is the product of invertible 
matrices. To show that A -1 is similar to B ~ l , use the 
equation P~ l AP = B . See the Study Guide. 

21. Hint: Review Practice Problem 2. 


23. Hint: Compute B(P l x). 


25. Hint: Write A = PBP 1 = ( PB)P 1 , and use the trace 
property. 

27. For each j , /(by) = by . Since the standard coordinate 
vector of any vector in M. 77 is just the vector itself, 

[I(bj)]g = b 7 . Thus the matrix for I relative to B and the 
standard basis S is simply [bi b 2 • • • b„ ]. This matrix 
is precisely the change-of-coordinates matrix P& defined in 
Section 4.4. 

29. The 23-matrix for the identity transformation is I n , because 
the ^-coordinate vector of the j th basis vector by is the j th 
column of I n . 


-l 


31. [M] 


7 -2 -6 

0 -4 -6 
0 0-1 


Section 5.5, page 302 


1. A — 2 T i , 


3. A — 2 T 3i , 


5. A — 2 T 2 i , 


-1 ~\~ i 
1 

1-3/ 

2 

1 

2 + 2 / 


; A = 2 — /, 


A = 2 - 3/, 


A = 2 - 2/, 


1 - / 

1 

1 + 3/ 
2 

1 

2 - 2 / 


7. A = a/3 ± / ,tp — 71/6 radian, r = 2 
9. A = — \/3/2 ± (1/2)/, <p = —5jt/6 radians, r = 1 
11 . A = .1 ± . 1 / , cp = —7r/4 radian, r = V 2/10 


21 . y = 


2 

-1 +2/ 

1 

<N 

1 _ 

-1 +2/ 

5 

5 


rp - 

23. (a) Properties of conjugates and the fact that x = x r ; 

(b) Ax = Ax and A is real; (c) because x T Ax is a scalar and 
hence may be viewed as a 1 x 1 matrix; (d) properties of 
transposes; (e) A T = A, definition of q 

25. Hint: First write x = Rex + /(Imx). 


27. [M] P = 


1 -1 

4 0 

0 0 

2 0 


2 

0 

3 

4 


0 

2 

-1 

0 


c = 


2 

5 

0 

0 


5 

2 

0 

0 


0 0 

0 0 

.3 -.1 
.1 .3 


Other choices are possible, but C must equal P 1 AP . 


Section 5.6, page 311 


1. a 


Hint: Find c \, c 2 such that x 0 = cqvi + c 2 v 2 . Use this 
representation and the fact that Vi and v 2 are 

49/3 


eigenvectors of A to compute Xi = 


41/3 


b. In general, x k = 5(3 ) A >1 — 4(^) Ar v 2 for k > 0. 


1 \k 
3 


3. When p = .2, the eigenvalues of A are .9 and .7, and 


x& = ci (.9) 


k 


1 

1 


+ c 2 (. 7 ) 




2 

1 


0 as k 00 


5. 


The higher predation rate cuts down the owls’ food supply, 
and eventually both predator and prey populations perish. 

If p — .325, the eigenvalues are 1.05 and .55. Since 
1.05 > 1, both populations will grow at 5% per year. An 
eigenvector for 1.05 is (6,13), so eventually there will be 
approximately 6 spotted owls to every 13 (thousand) flying 
squirrels. 

7. a. The origin is a saddle point because A has one 

eigenvalue larger than 1 and one smaller than 1 (in 
absolute value). 
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Answers to Odd-Numbered Exercises 


b. The direction of greatest attraction is given by the 
eigenvector corresponding to the eigenvalue 1/3, 
namely, v 2 . All vectors that are multiples of v 2 are 
attracted to the origin. The direction of greatest 
repulsion is given by the eigenvector Vi. All multiples 
of Vi are repelled. 

c. See the Study Guide. 

9. Saddle point; eigenvalues: 2, .5; direction of greatest 
repulsion: the line through (0,0) and (—1,1); direction of 
greatest attraction: the line through (0,0) and (1, 4) 

11. Attractor; eigenvalues: .9, .8; greatest attraction: line 
through (0,0) and (5,4) 

13. Repellor; eigenvalues: 1.2,1.1; greatest repulsion: line 
through (0,0) and (3,4) 



2 " 


~~r 

15. Xfc = Vi + A(.5) k 

-3 

1 

+ .3(.2) k 

0 

1 


Vi as 


k 


oo 


17. a. A = 


0 

.3 


1.6 

.8 


b. The population is growing because the largest 
eigenvalue of A is 1.2, which is larger than 1 in 
magnitude. The eventual growth rate is 1.2, which is 
20% per year. The eigenvector (4, 3) for X\ = 1.2 
shows that there will be 4 juveniles for every 

3 adults. 

c. [M] The juvenile-adult ratio seems to stabilize after 
about 5 or 6 years. The Study Guide describes how to 
construct a matrix program to generate a data matrix 
whose columns list the numbers of juveniles and adults 
each year. Graphing the data is also discussed. 

Section 5.7, page 319 


1 . x(t) = - 
v 7 2 


3. -- 


-1 

LO 

_1 

e 4t - - 

1 

1 _ 

1 — 

1 _ 

2 

1 

_1 


.2 1 


"-3" 

, 9 

e -\ — 

"-1" 

1 

2 

1 


e t . The origin is a saddle point. 


The direction of greatest attraction is the line through 
(—1,1) and the origin. The direction of greatest repulsion is 
the line through (—3,1) and the origin. 


1 

5. — 


" 1 " 

e 4 ' + 1 - 

" 1 " 

3 

2 

1 


e 6t . The origin is a repellor. The 


direction of greatest repulsion is the line through (1,1) and 
the origin. 


7. Set P = 


A = PDP 1 . Substituting x = P y into x' = Ax, we have 


17 


y' = D y, 


or 




"4 

0 " 

>i(0‘ 

jii 0. 


_0 

6 _ 



9. (complex solution): 


c 1 


1 -i 
1 


e(~ 2+i)t + c 2 


1 + i 
i 


(—2 —i)t 


(real solution): 



cos t + sin / 

—It 

sin t — cos t 

Cl 

cos t 

e +c 2 

-1 

c 

• rH 

C/5 

_1 


-2 1 


The trajectories spiral in toward the origin. 


11 . (complex): c\ 


(real): 


-3 + 3 i 
2 


e 3lt + c 2 


-3 - 3 i 
2 


3 it 


Cl 


3 cos 3/ — 3 sin 3/ 
2 cos 3 1 


+ c 2 


—3 sin 3/ + 3 cos 3 1 
2 sin 3 1 


The trajectories are ellipses about the origin. 


13. (complex): C\ 


(real): C\ 


1 + i 
2 

cos 3 1 — sin 3 1 
2 cos 3 1 


£ ( 1 +3l )t -\- C 2 


l-l 

2 


g(l-3 i)t 


e l + c 2 


sin 3 1 + cos 3 1 
2 sin 3 1 


The trajectories spiral out, away from the origin. 


t 



~~r 


"- 6 “ 


~-4“ 

15. [M] x(/) = ci 

0 

1 

e 2t + c 2 

1 

5 

cn 

+ 

1 

1 

4 


The origin is a saddle point. A solution with c 3 
attracted to the origin. A solution with C\ — c 2 
repelled. 


Ois 

Ois 


[M] (complex): 
'-3 

ci 1 

1 



" 23 - 34/ " 

e r + c 2 

-9+ 14/ 


3 


e (5+2i)t _|_ 




23 + 34/ 
-9-14/ 
3 


e (5—2i)t 



"-3“ 



23 cos 2/ + 34 sin 2/ 

(real): C\ 

1 

e l 

+ c 2 

—9 cos 2t — 14 sin 2/ 


1 



3 cos 2/ 


e 5t + 


5t 


23 sin 2/ — 34 cos 2t 
c 3 —9 sin 2 / + 14 cos 2/ 

3 sin 2t 

The origin is a repellor. The trajectories spiral outward, 
away from the origin. 


"1 

_3 

r 

i 

and D = 

"4 0" 

_° 6 _ 

. Then 

19. [M] A = 

"-2 3/4" 

1 -1 


d 

dt 


'vi(0‘ 

5 

"1" 

0 —.5t ^ 

"-3" 

_v 2 (r)_ 

” 2 

2 

c — — 

2 

2 


—2.5/ 


(P y) = MP y) 


Py = PDP~\P y) = PD y 


21. [M] A = 


1 

5 


-8 

-5 


Left-multiplying by P 1 gives 




—20 sin 6t 

_V C (0. 


15 cos 6/ — 5 sin 6/ 


- 3 1 























































































Chapter 5 Supplementary Exercises A43 


Section 5.8, page 326 





7. 

9. 

11 . 

13. 

15. 

17. 

19. 



Eigenvector: x 4 = 

1 

_.3326_ 

, or Ax 4 = 

"4.9978 
_ 1.6652 

A % 4.9978 

Eigenvector: x 4 = 

".5188“ 

1 

, or Ax 4 = 

".4594" 
.9 °75_ 


A % .9075 


5 




4.0015 

-5.0020 


estimated A = —5.0020 


5 


[M] x k : 

.15 

1 


1 

.9565 


.9932 

1 


1 

.9990 


.9998 

1 

Ak' 

11.5, 

12.78, 


12.96, 


12.9948, 

12.9990 


[M] /i 5 = 8.4233, /z 6 = 8.4246; actual value: 8.42443 
(accurate to 5 places) 


l± k \ 5.8000, 5.9655, 5.9942, 5.9990 (k = 1,2, 3, 4); 
R(x k ): 5.9655, 5.9990, 5.99997, 5.9999993 


Yes, but the sequences may converge very slowly. 


Hint: Write Ax — ax = (A — al)x , and use the fact that 
(A — aI) is invertible when a is not an eigenvalue of A. 


[M] v 0 = 3.3384, Vi = 3.32119 (accurate to 4 places with 
rounding), v 2 = 3.3212209. Actual value: 3.3212201 
(accurate to 7 places) 


a. /z 6 = 30.2887 = /x 7 to four decimal places. To six 
places, the largest eigenvalue is 30.288685, with 
eigenvector (.957629, .688937,1, .943782). 

b. The inverse power method (with a = 0) produces 
fiy 1 = .010141,/zy 1 = .010150. To seven places, the 
smallest eigenvalue is .0101500, with eigenvector 
(—.603972,1, —.251135, .148953). The reason for the 
rapid convergence is that the next-to-smallest 
eigenvalue is near .85. 

a. If the eigenvalues of A are all less than 1 in magnitude, 
and if x ^ 0 , then A k x is approximately an eigenvector 
for large k . 

b. If the strictly dominant eigenvalue is 1, and if x has a 
component in the direction of the corresponding 
eigenvector, then {A k x} will converge to a multiple of 
that eigenvector. 

c. If the eigenvalues of A are all greater than 1 in 
magnitude, and if x is not an eigenvector, then the 
distance from A k x to the nearest eigenvector will 
increase as k -> 00 . 


Chapter 5 Supplementary Exercises, page 328 


1 . a. 

T 

b. F 

c. 

T 

d. 

F 

e. T 

f. 

T 

§• F 

h. 

T 

• 

1. 

F 

J. T 

k. 

F 

1. F 

m. 

F 

n. 

T 

0 . F 

P- 

T 

q. F 

r. 

T 

s. 

F 

t. T 

u. 

T 

v. T 

w. 

F 

X. 

T 



3. a. Suppose Ax = Ax, with x/0. Then 

(51 — A)x = 5x — Ax = 5x — Ax = (5 — A)x. The 
eigenvalue is 5 — A. 

b. (51 — 3A A A 2 )x = 5x — 3Ax A A(Ax) = 5x — 3Ax+ 

A 2 x = (5 — 3A + A 2 )x. The eigenvalue is 5 — 3A + A 2 . 

5. Suppose Ax = Ax, with x^ 0. Then 

p(A)x = (cqI A C\A A c 2 A~ + • • • + c n A n )x 

= c 0 x + c\ Ax A c 2 A 2 x + • • • + c n A n x 
= c 0 x + ciAx + c 2 A 2 x + • • • + c n X n x = p( A)x 

So p( A) is an eigenvalue of the matrix p(A). 

7. If A = PDP ~ l , then p(A) = Pp(D)P ~ l , as shown in 
Exercise 6. If the (j, j) entry in D is A, then the (j, j) 
entry in D k is A^, and so the (j, j ) entry in p(D) is p( A). If 
p is the characteristic polynomial of A, then p( A) = 0 for 
each diagonal entry of D , because these entries in D are the 
eigenvalues of A. Thus p(D ) is the zero matrix. Thus 
p(A) = P-0-P~ l = 0. 

9. If I — A were not invertible, then the equation 

(I — A)x = 0 would have a nontrivial solution x. Then 
x — Ax = 0 and Ax = 1 • x, which shows that A would 
have 1 as an eigenvalue. This cannot happen if all the 
eigenvalues are less than 1 in magnitude. So I — A must be 
invertible. 

11. a. Take x in H. Then x = c u for some scalar c . So 

Ax = A(c u) = c(Axx) = c(Au) = (cA)u, which shows 
that Ax is in H . 

b. Let x be a nonzero vector in K. Since K is 
one-dimensional, K must be the set of all scalar 
multiples of x. If K is invariant under A, then Ax is in 
K and hence Ax is a multiple of x. Thus x is an 
eigenvector of A . 

13. 1,3,7 

15. Replace a by a — A in the determinant formula from 
Exercise 16 in Chapter 3 Supplementary Exercises: 

det(A — A I) = (a — b — A ) n ~ l [a — A A (n — 1 )b] 

This determinant is zero only if a — b — A = 0or 
a — X A (n — 1 )b = 0. Thus A is an eigenvalue of A if and 
only if A = a — b or A = a A (n — 1). From the formula 
for det(A — XI) above, the algebraic multiplicity is n — 1 
for a — b and 1 for a A (n — 1 )b. 

17. det(A — XI) = (an — X)(a 22 — X) — ai 2 a 2 i = 

A 2 — (an A a 22 )X A (^ 11^22 — ^ 12 ^ 21 ) = A 2 — (trA)A + detA. 
Use the quadratic formula to solve the characteristic 
equation: 


A = 


tr A A yj (tr A) 2 — 4 det A 


The eigenvalues are both real if and only if the discriminant 
is nonnegative, that is, (tr A) 2 — 4 det A > 0. This inequality 

tr x4 

simplifies to (trA) 2 > 4 det A and I - I > det A. 
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19. 

21 . 





; det (C p 


XI) = 6-5A + A 2 = p{X) 


If p is a polynomial of order 2, then a calculation such as in 
Exercise 19 shows that the characteristic polynomial of C p 
is p{X) = (—1 ) 2 /?(A), so the result is true for n = 2. 
Suppose the result is true for n —k for some k > 2, and 
consider a polynomial p of degree k + 1. Then expanding 
det(C^ — XI) by cofactors down the first column, the 
determinant of C p — XI equals 



(—A) det 


0 
a i 



1 


+ (-l) t+1 ao 




The k x k matrix shown is C q — XI , where 

q(t) = a\ + + • • • + akt k ~ l + ^ • By the induction 

assumption, the determinant of C q — XI is (—l) k q(X). Thus 


13. 5V5 


15. Not orthogonal 


17. Orthogonal 


19. Refer to the Study Guide after you have written your 


answers. 


21. Hint: Use Theorems 3 and 2 from Section 2.1. 


23. u*v = 0, ull 2 = 30, llvll 2 = 101, 


2 _ 


U + v|| 2 = (-5Y + (-9 y + 5 2 = 131 = 30 + 101 


2 _ 


25. The set of all multiples of 


—b 


a 


(when v/0) 


27. Hint: Use the definition of orthogonality. 

29. Hint: Consider a typical vector w = c\\\ + • • • + c p \ p 
in W. 

31. Hint: If x is in W 1 -, then x is orthogonal to every vector 
in W. 


33. [M] State your conjecture and verify it algebraically. 


det(C p - XI) = (—I) k+l a 0 + (-X)(-l) k q(X) 

= (—1)^ +1 [a 0 + X(ai + • • • + cik X k 1 + X k )] 
= (-1 ) k+l p(X) 

So the formula holds for n = k + 1 when it holds for 
n = k. By the principle of induction, the formula for 
det (C p —XI) is true for all n >2. 

23. From Exercise 22, the columns of the Vandermonde matrix 
V are eigenvectors of C p , corresponding to the eigenvalues 
X \, X 2 , A 3 (the roots of the polynomial p). Since these 
eigenvalues are distinct, the eigenvectors form a linearly 
independent set, by Theorem 2 in Section 5.1. Thus V has 
linearly independent columns and hence is invertible, by the 
Invertible Matrix Theorem. Finally, since the columns of V 
are eigenvectors of C p , the Diagonalization Theorem 
(Theorem 5 in Section 5.3) shows that V~ l C p V is diagonal. 

25. [M] If your matrix program computes eigenvalues and 
eigenvectors by iterative methods rather than symbolic 
calculations, you may have some difficulties. You should 
find that AP — PD has extremely small entries and 
PDP~ l is close to A. (This was true just a few years ago, 
but the situation could change as matrix programs continue 
to improve.) If you constructed P from the program’s 
eigenvectors, check the condition number of P . This may 
indicate that you do not really have three linearly 
independent eigenvectors. 


Chapter 6 


Section 6.1, page 338 


1.5,8,1 

7. 735 




—.6 

• 8 _ 


'7/769' 

9. 

11. 

2/769 

4/769 


Section 6.2, page 346 

1. Not orthogonal 3. Not orthogonal 5. Orthogonal 

7. Show Ui *u 2 = 0, mention Theorem 4, and observe that two 
linearly independent vectors in M 2 form a basis. Then obtain 


2 

26 

6 

= 3 

2 

+ \ 

6 

-3 

' 52 

4 

-3 

1 2 

4 


9. Show Ui *u 2 = 0, Ui *u 3 = 0, and u 2 *u 3 = 0. Mention 
Theorem 4, and observe that three linearly independent 
vectors in R 3 form a basis. Then obtain 


X = fill - f^u 2 + ^u 3 = fu, - |u 2 + 2 u 3 



15. y-y 



distance is 1 



' 1 / 73 " 


■-1/72" 

17. 

1/73 

5 

0 


. 1 / 73 . 


1 /72. 


19. Orthonormal 21. Orthonormal 
23. See the Study Guide. 

25. Hint: ||Ux|| 2 = (Ux) T (Ux). Also,parts (a) and (c) follow 
from (b). 


27. Hint: You need two theorems, one of which applies only to 
square matrices. 

29. Hint: If you have a candidate for an inverse, you can check 

to see whether the candidate works. 

- y*u 

31. Suppose y = -u. Replace u by c u with c / 0; then 

u*u 


y • (cu) 

(cu). (cu) 



c(y-u) 

C 2 U‘U 



















































Section 6.5 A45 


33. Let L = Span{u}, where u is nonzero, and let 
T (x) = proj L x. By definition, 


T(x) = 


X‘U 


U‘U 


u = (x*u)(u*u) l u 


For x and y in R” and any scalars c and d , properties of the 
inner product (Theorem 1) show that 


-l 


T(cx + d y) = [(cx + Jy)*u](u*u) u 

= [c(x-u) + 4(yu)](u-u) _1 


u 


= c(x*u)(u*u) + d(y*u)(u*u) l u 

= cT(x) + dT( y) 


Thus T is linear. 


Section 6.3, page 354 


11 . 


1. X = -fll! - |u 2 + |u 3 + 2u 4; X = 


2 

9 


2 

3 


0" 


" 10" 

-2 

+ 

-6 

4 

-2 

-2 


2 



1 

_ 1 


1 

_ 1 

3. 

1 

^ 0 

_ 1 

5. 

2 

6_ 


= y 



'10/3' 


'-7/3' 

7 . y = 

2/3 

+ 

7/3 


8/3 


7/3 


9 . y = 


"2" 


2" 

4 

+ 

-1 

0 

3 

0 


-1 


1 

LO 

_1 


1 

_ 1 

-1 

13. 

-3 

1 

-2 

1 

1_ 


1 

1_ 


15. 740 


17. a. U T U = 


UU T = 


1 0 
0 1 

8/9 

-2/9 

2/9 


-2/9 

5/9 

4/9 


2/9 

4/9 

5/9 


Section 6.4, page 360 


1. 


b. proj h/ y = 6ui + 3u 2 = 


19. Any multiple of 


0 


" 0 " 

2/5 

, such as 

2 

- 1 / 5 . 


1 


5. 


3" 


-1 


0 

9 

5 

3. 

-1_ 


1 

LO 

1_ 


r 


5" 


-4 


1 

7. 

0 

9 

-4 

1 


-1 

— 


2 

5 

1 



3 

9 

3/2 


3/2 


2/V30 
-5/V30 

1/ a /30_ 


9. 


1 


1 


1 

_1 

1 


3 


1 

-1 

9 

3 

9 

1 

1 

1_ 


-1 


1 

1_ 


11 . 


9 

'2/76" 

1/76 

.1/76. 

1 


1 

OJ 

1 

-1 


0 

-1 

9 

3 

1 


-3 

1 


1 

CO 

1 


2 

0 

2 

2 

-2 


13. = 


6 

0 


12 

6 


15. Q = 


R = 


1 /a/5 
-1/V5 
-1/V5 
1/V5 
1/V5 

V5 - 
0 
0 


1/2 

0 

1/2 

1/2 

1/2 


1/2 

0 

1/2 

1/2 

1/2 


V5 

6 

0 


4 V5 
-2 
4 


17. See the S/mfy Guide. 


19. Suppose x satisfies Rx = 0; then QRx = Q 0 = 0, and 
Ax = 0. Since the columns of A are linearly independent, x 
must be zero. This fact, in turn, shows that the columns of 
R are linearly independent. Since R is square, it is 
invertible, by the Invertible Matrix Theorem. 

21. Denote the columns of Q by q 1? ..., q n . Note that n < m, 
because A is m x n and has linearly independent columns. 
Use the fact that the columns of Q can be extended to an 
orthonormal basis for IR m , say, {q 1 ,..., q m }. (The Study 
Guide describes one method.) Let Q 0 = [q n+1 • • • q m ] 


" 2 " 


2 

and 0 i = [0 0 

o].1 

4 

5 

,and (UU T )y = 

1 

4 ^ 

1 _ 

multiplication, Q\ 

1 1 

0 

1_1 


21. Write your answers before checking the Study Guide. 

23. Hint: Use Theorem 3 and the Orthogonal Decomposition 
Theorem. For the uniqueness, suppose Ap = b and 
A\) l = b, and consider the equations p = p t + (p — Pi) 
and p = p + 0. 

25. [M] U has orthonormal columns, by Theorem 6 in 

Section 6.2, because U T U = / 4 . The closest point to y in 
Col U is the orthogonal projection y of y onto Col U. From 
Theorem 10, 

y = UU T \ y = (1.2, .4,1.2,1.2, .4,1.2, .4, .4) 


= QR = A. 


23. Hint: Partition R as a 2 x 2 block matrix. 

25. [M] The diagonal entries of R are 20, 6,10.3923, and 
7.0711, to four decimal places. 


Section 6.5, page 368 


1. a. 


6 -11 

11 22 


X\ 

*2 


-4 

11 


b. x = 


3 

2 


3. a 


6 

6 


6 

42 


X\ 

*2 


6 

6 


b. x = 


4/3 

-1/3 


5. x = 


5" 


-3 

+ X3 

0 



-1 

1 

1 


7. 2V5 
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9. a. b = 




2/7 

1/7 



r 3 i 



yv 

1 


2/3 

11. a. b = 

A 

b. x = 

0 


_-i_ 


- 1 / 3 . 



11“ 


T 

13. An = 

-11 

,A\ = 

-12 


11 


7 



1 

0 

1 _ 


1 

1_ 

b — An = 

2 

_—6_ 

, b — Ax = 

1 

1_ 


. No, u could not 


possibly be a least-squares solution of Ax = b. Why? 


15. x = 


4 

-1 


17. See the Study Guide. 


19. a. If Ax = 0, then A T Ax = A T b = 0. This shows that 

Nul A is contained in Nul A T A. 

b. If A T Ax = 0, then x T A T Ax = x r 0 = 0. So 

(Ax) T (Ax) = 0 (which means that ||^4x|| 2 = 0), and 
hence Ax = 0. This shows that Nul A T A is contained 
in Nul A. 


21. Hint: For part (a), use an important theorem from Chapter 2. 

23. By Theorem 14, b = Ax = A(A T A)~ l A T b. The matrix 
A(A T A)~ l A T occurs frequently in statistics, where it is 
sometimes called the hat-matrix. 


25. The normal equations are 


-1 

to 

to 

_ 1 

X 


"6" 

2 2 

_T_ 


6 


, whose 


solution is the set of (x, y) such that x + y = 3. The 
solutions correspond to points on the line midway between 
the lines x + y — 2 and x + y = 4. 


Section 6.6, page 376 
1. y = .9 + .4x 3. y = 1.1 + 1.3x 



y = X ($ + € , where y = 




cos 1 
cos 2 
cos 3 


sin 1 
sin 2 
sin 3 


11. [M] P = 1.45 and e = .811; the orbit is an ellipse. The 
equation r = /3/(l — e • cos d) produces r = 1.33 when 
d = 4.6. 


13. [M] a. y = -.8558 + 4.7025? + 5.5554? 2 - .0274? 3 

b. The velocity function is 

v(t) = 4.7025 + 11.1108? - .0822? 2 , and 
i/(4.5) = 53.0 ft/sec. 

15. Hint: Write X and y as in equation (1), and compute X T X 
and X T y. 

17. a. The mean of the x-data is x = 5.5. The data in 

mean-deviation form are (—3.5,1), (—.5,2), (1.5, 3), 
and (2.5, 3). The columns of X are orthogonal because 
the entries in the second column sum to 0. 


"4 

0 " 

'A.' 


9" 

0 

21 

.A. 


7.5 


y = | + ta x * = ! + ^( x “ 5 - 5 ) 

19. Hint: The equation has a nice geometric interpretation. 


Section 6.7, page 384 

1. a. 3, \/l05,225 b. All multiples of j 

3. 28 5. 5a/2, 3\/3 7. | + 

9. a. Constant polynomial,/?(?) = 5 

b. t 2 — 5 is orthogonal to p 0 and p\ \ values: 
(4, —4, —4, 4); answer: q(t) = \{t 2 — 5) 

11. 

13. Verify each of the four axioms. For instance: 


5. If two data points have different x-coordinates, then the two 
columns of the design matrix X cannot be multiples of each 
other and hence are linearly independent. By Theorem 14 in 
Section 6.5, the normal equations have a unique solution. 


7. a. y = Xp + e, where y 


1.8 


1 

1 

2.7 


2 

4 

3.4 

,x = 

3 

9 

3.8 


4 

16 

3.9 


5 

25 



€1 
€2 

£3 

€4 

£5 


b. [M] y = 1.76x-.20x 2 


1. (u, v) = (Au)*(Av) 

= (4v)*(4u) 


= (V, U) 

15. (u, c\) = (cv, u) 

= c(v,u) 

= c{ u, v) 


Definition 

Property of the dot product 
Definition 


Axiom 1 
Axiom 3 
Axiom 1 


17. Hint: Compute 4 times the right-hand side. 

19. (u,v) = *Ja\fb + y/by/a = 2 Vab, 

||u|| 2 = (y/a) 2 + (a fb) 2 = a -\- b. Since a and b are 
nonnegative, ||u|| = y/a + b. Similarly, ||v|| = y/b + a. 

By Cauchy-Schwarz, 2 Vab < y/a + by/b + a = a + b 

/— a + b 
Hence, y/ab < -. 


21 . 0 


23. 2/V5 


25. 1, t, 3t 2 — 1 
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27. [M] The new orthogonal polynomials are multiples of 
— lit + 5t 3 and 72 — 155^ 2 + 3 5t A . Scale these 
polynomials so their values at —2, —1,0,1, and 2 are small 
integers. 

Section 6.8, page 391 

1. y = 2 + \t 

3. pit) = 4/?o — Ap\ — ,5p 2 + -2p3 

= 4 - At - ,5(t 2 - 2) + .2 (It 3 - p) 

(This polynomial happens to fit the data exactly.) 

5. Use the identity 

sin mt sin nt = ^[cos (mt — nt) — cos {mt + nt)\ 

9 1 + cos 2 kt 

7. Use the identity cos kt = -—-. 

9. 7i + 2 sin t + sin 2t + 3 sin 3 1 [Hint: Save time by using 
the results from Example 4.] 

11. \ - \cos2t (Why?) 

13. Hint: Take functions / and g in C [0, 2n ], and fix an integer 
m > 0. Write the Fourier coefficient of / + g that involves 
cos m t , and write the Fourier coefficient that involves 
sin mt{m > 0). 

15. [M] The cubic curve is the graph of 

g(t) = -.2685 + 3.6095^ + 5.8576; 2 - .0477; 3 . The 
velocity at; = 4.5 seconds is g 7 (4.5) = 53.4 ft/sec. This is 
about .7% faster than the estimate obtained in Exercise 13 
in Section 6.6. 


Chapter 6 Supplementary Exercises, page 392 


a. 

F 

b. 

T 

c. T 

d. 

F 

e. 

F 

f. 

T 

§• 

T 

h. 

T 

i. F 

• 

J- 

T 

k. 

T 

1 . 

F 

m. 

s. 

T 

F 

n. 

F 

0. F 

P- 

T 

q- 

T 

r. 

F 


2. Hint: If {vi, v 2 } is an orthonormal set and x = CiVi + c 2 v 2 , 
then the vectors c 1 V 1 and c 2 v 2 are orthogonal, and 




IklVi + C2V2II 2 = ||CiVi II 2 + IIC2V2II 2 
(kl|||v 1 ||) 2 + (k 2 |||V 2||) 2 = |C,| 2 + M 2 


(Explain why.) So the stated equality holds for p = 2. 
Suppose that the equality holds for p = k, with k > 2, let 
{vi,..., \k+ 1} be an orthonormal set, and consider 

x — C1V1 + • • • + CkVk + Gr+lW+l — + C k +I\ k +U 

where = c x \ x H - h c k v k . 


3. Given x and an orthonormal set {vi ,..., v p } in E” , let x be 
the orthogonal projection of x onto the subspace spanned by 
Vi, ..., \ p . By Theorem 10 in Section 6.3, 


x = (x-Vi)vi H-h (x^y p)\ p 


By Exercise 2, ||x|| 2 = |x*Vi | 2 + • • • + |x- | 2 . Bessel’s 
inequality follows from the fact that ||x|| 2 < ||x|| 2 , noted 
before the statement of the Cauchy-Schwarz inequality, in 
Section 6.7. 


5. Suppose (U x) • (Uy) = x*y for all x, y in R n , and let 
ei,..., e„ be the standard basis for E". For 
j = 1,..., n, Uej is the yth column of U. Since 
|| U e y || 2 = (U e ; ) • (U e y ) = e 7 *e 7 = 1 , the columns of U 
are unit vectors; since (Ue y ) • (Ue k ) = e 7 - • t k = 0 for 
j ^ k, the columns are pairwise orthogonal. 

7. Hint: Compute Q T Q, using the fact that 
(uu T ) T = u TT u T = uu T . 


9. Let W = Span {u, v}. Given z in W 1 , let z = proj^ z. Then 
z is in Col A, where A = [u v] , say, z = Ax for some x 
in R 2 . So x is a least-squares solution of Ax = z. The 
normal equations can be solved to produce x, and then z is 
found by computing Ax. 


11 . Hint: Let x = 


X 


a 


r 

y 

,b = 

b 

, v = 

-2 

_z_ 


c 


5 


, and 



r t n 

V 


"1 

-2 

5 " 

A = 

T 

V 

— 

1 

-2 

5 


T 

V 


1 

-2 

5 


. The given set of 


equations is Ax = b, and the set of all least-squares 
solutions coincides with the set of solutions of 
A T Ax = A T b (Theorem 13 in Section 6.5). Study this 
equation, and use the fact that (vv r )x = v(v r x) = (v r x)v, 
because v r x is a scalar. 


13. a. The row-column calculation of Au shows that each row 

of A is orthogonal to every u in Nul A . So each row of 
A is in (Nul A) 1 - . Since (Nul A) 1 - is a subspace, it must 
contain all linear combinations of the rows of A; hence 
(Nul A) 1 - contains Row A. 

b. If rank A = r , then dim Nul A = n — r , by the Rank 
Theorem. By Exercise 24(c) in Section 6.3, 

dim Nul A + dim (Nul A) ± = n 

So dim (Nul ^4)-*- must be r. But Row A is an 
r-dimensional subspace of (Nul A) 1 - , by the Rank 
Theorem and part (a). Therefore, Row A must coincide 
with (NuM)^. 

c. Replace A by A T in part (b) and conclude that Row A T 
coincides with (Nul A 7 ) 1 - . Since Row A T = Col A, this 
proves (c). 

15. If A = URU T with U orthogonal, then A is similar to R 
(because U is invertible and U T = U~ l ) and so A has the 
same eigenvalues as R (by Theorem 4 in Section 5.2), 
namely, the n real numbers on the diagonal of R . 

|| Ax || 

17. [M] = .4618, 

||x|| 

|| Ab || 

cond(v4) x iE-/ = 3363 x (1.548 x KT 4 ) = .5206. 

||b|| 

Observe that || Ax||/||x|| almost equals cond(y4) times 

II Ab||/||b||. 

19. [M] JAp- = 7.178 x 10“ 8 , = 2.832 x 1(T 4 . 

||x || ||b|| 

Observe that the relative change in x is much smaller than 
the relative change in b. In fact, since 
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II Ab II 

condM) x i = 23,683 x (2.832 x 10 -4 ) = 6.707 

||b|| 

the theoretical bound on the relative change in x is 6.707 
(to four significant figures). This exercise shows that 
even when a condition number is large, the relative error 
in a solution need not be as large as you might expect. 

Chapter 7 

Section 7.1, page 401 


27. (4x)*y = (4x) r y = x T A T y = x T Ay = x*(4y), because 
A T = A. 

29. Hint: Use an orthogonal diagonalization of A, or appeal to 
Theorem 2. 

31. The Diagonalization Theorem in Section 5.3 says that the 
columns of P are (linearly independent) eigenvectors 
corresponding to the eigenvalues of A listed on the diagonal 
of D . So P has exactly k columns of eigenvectors 
corresponding to A. These k columns form a basis for the 
eigenspace. 


1. Symmetric 3. Not symmetric 5. Symmetric 


33. A = 8 uiuf + 6112 U 2 + 3u 3 u 3 


T 


T 


15. P = 


1/V5 2/V5 


,D = 


0 11 


17. P = 


D = 


19. P = 


-l/>/2 

0 

1/V2 

-4 0 

0 4 

0 0 

-1 /Vs 

2/V5 


1/V6 

2/V6 

1/V6 


0 

0 

7 


4/V45 

2/V45 


0 5/V45 


D = 


7 0 

0 7 

0 0 


0 

0 

2 


1/V3 

1/V3 

1/V3 


-2/3' 
— 1/3 
2/3 _ 




r.6 .si 



1/2 

- 1/2 

0" 

7. 

Orthogonal, 

.8 —.6 _ 


= 8 

- 1/2 

1/2 

0 






0 

0 

0_ 

9. 

Orthogonal, 

'-4/5 

3/5 

3/5 “ 
4/5 


1/6 

1/6 

-2/e 





+ 6 

1/6 

1/6 

-2/e 

11 . 

Not orthogonal 



- 2/6 

- 2/6 

4/e 


13. P = 

1/V2 

-1/V2 

,7> = 

"4 0" 


■ 1/3 

1/3 

1/3“ 

1/V2 

1/V2 

_° 2 _ 

+ 3 

1/3 

1/3 

1/3 




— 



1/3 

1/3 

1/3 


35. Hint: (uu r )x = u(u r x) = (u r x)u, because u r x is a scalar. 




-1 

1 

1 

1 


1 

1 

1 

1 

-1 

5 

37. [M] P = - 

-1 

1 

-1 

-1 

— 


1 

1 

-1 

1_ 


D = 


19 

0 

0 

0 


0 

11 

0 


0 

0 

5 


0 

0 

0 


39. [M] P = 


0 

0 -11_ 


riA/2 

3/V50 -2/5 

-2/51 

0 

4/V50 -1/5 

4/5 

0 

4/V50 4/5 

-1/5 

L1/V2 

-3/V50 2/5 

2/5 J 


21. P = 

■ 0 

0 

1/V2 

.-1/V2 

1/V2 1/2 

-1/V2 1/2 

0 -1/2 

0 -1/2 


" 1 

0 

0 

0" 


D = 

0 

1 

0 

0 


0 

0 

5 

0 



0 

0 

0 

9 


23. P = 

i/a /3 

1/V3 

i/a /3 

1 1 

[CN Lcs 

-1/V6 

-I/a/6 , 

2 /V6 


"2 

0 

0" 



D = 

0 

5 

0 




0 

0 

5 




25. See the Study Guide. 



Section 7.2, page 408 

1. a. 5x^ + |xix 2 + xf b. 185 c. 16 


3. a. 







3 

-3 

4" 


1 

0 

3 

1 

<N 

5. a. 

-3 

2 

-2 

b. 

3 

0 

-5 


_1 

-2 

-5 


<N 

_1 

-5 

1 

0 



x = P y, where P = 




. y T Dy = 6y?-4y% 































































Section 7.4 A49 


In Exercises 9-14, other answers (change of variables and new 
quadratic form) are possible. 


9. Positive definite; eigenvalues are 6 and 2 
Change of variable: x = P y, with P = 


13 


15 


17 


25. 


1 


V2L 


-1 

1 


1 

1 


New quadratic form: 6 y\ + 2 y\ 

11. Indefinite; eigenvalues are 3 and —2 

Change of variable: x = P y, with P = 


1 


V5L 


-2 

1 


1 

2 


New quadratic form: 3 y\ — 2 y\ 

Positive semidefinite; eigenvalues are 10 and 0 

1 T 1 3 

Change of variable: x = P y, with P = —— 

VIoL -3 1 

New quadratic form: 10 i> [ 

[M] Negative definite; eigenvalues are —13, —9, -7,-1 
Change of variable: x = Py; 


P = 


0 

0 

-1/V2 

1/V2 


1/2 

1/2 

1/2 

1/2 


0 

2/s/6 

1/V6 

1/V6 


3/V12 

i/vn 

1/VI2 

1/VI2 


New quadratic form: — 13 v/ — 9y\ — ly\ — yj 

[M] Positive definite; eigenvalues are 1 and 21 
Change of variable: x = Py; 


P = 


1 


V50 


4 

5 
3 
0 


3 4-3 

0 5 0 

4 3 4 

5 0 5 


New quadratic form: y 2 + y\ + 21 y 2 + 21y 2 
19. 8 21. See the Study Guide. 

23. Write the characteristic polynomial in two ways: 


det(A — XI) = det 


a — A b 
b d — X 
= A 2 — (a + d)A + ad — b 2 


3. a. 9 


b. =b 


1/3 

2/3 

2/3 


c. 6 


5. a. 6 


b. =b 


1/V2 

1/V2 


c. —4 


7. ± 


1/3 

2/3 

2/3 


9. 5 + V5 11. 3 


13. Hint: If m = M , take a = 0 in the formula for x. That is, 
let x = u n , and verify that x T Ax = m. If m < M and if t is 
a number between m and M , then 0 < t — m < M — m and 
0 < (t — m)/(M — m) < 1. So let a = (t — m )/(M — m). 
Solve the expression for a to see that t = (1 — + ofM. 

As of goes from 0 to 1, t goes from m to M. Construct x as 
in the statement of the exercise, and verify its properties. 


15. [M] a. 9 


b. 


’ —2/s/6 
0 

1/V6 

\/s/6 


c. 3 


17. [M] a. 34 b. 


' 1/2 
1/2 
1/2 
L 1/2 


c. 26 


Section 7.4, page 425 

1. 3,1 


3. 4,1 


The answers in Exercises 5-13 are not the only possibilities. 


5. 


1 

0 


0 

1 



"2 

1 

O 


" 1 

1 

O 

1 

1 

O 

1 

O 


1 

O 

1 


and 



7. 

1/V5 

-2 /s/5 

"3 

0" 

A 2 ) — A — (Ai T A 2 )A + A^A 2 

2/s/5 

1/ s/5 

0 

2 


Equate coefficients to obtain X\ + X 2 = a + d and 
Ai A 2 = ad — b 2 = det A. 

Exercise 28 in Section 7.1 showed that B T B is symmetric. 


x 


2/ s/5 1/ s/5 

-l/s/5 2/ s/5 


Also ,x T B T Bx = (BxyBx = ||5x||“ > 0, so the quadratic 


-1 

0 

0 


" 3 V 2 

0 

form is positive semidefinite, and we say that the matrix 

9. 

0 

0 

1 


0 

a/2 

B T B is positive semidefinite. Hint: To show that B T B is 


0 

1 

0 


0 

0 


positive definite when B is square and invertible, suppose 
that x t B t Bx = 0 and deduce that x = 0. 

27. Hint: Show that A + B is symmetric and the quadratic 
form x T (A + B)x is positive definite. 

Section 7.3, page 415 


X 


-1/V2 

1/V2 


1/V2 

1 /V 2 


11 . 


1/3 

2/3 

-2/3 

2/3 

1/3 

2/3 

2/3 

2/3 

1/3 


X 


-1/3 2/3 2/3 

2/3 -1/3 2/3 

2/3 2/3 -1/3 

3/710 -1/a/I0 

i/vto 3/yio 



V90 

0 


0 

0 


0 

0_ 


1. x = P y, where P = 

































































A50 Answers to Odd-Numbered Exercises 



1 /V 2 

- 1 /V 2 

"5 0 0" 

1 /V 2 

1 /V 2 

_° 3 °_ 



1 /V 2 

-1/V18 

- 2/3 


1/V2 0 

1/V18 -4/VI8 
2/3 1/3 


15. a. rank 4 = 2 



1 

0 

• 

1 _ 


"-.78“ 

b. Basis for Col A: 

.37 

5 

-.33 


1 

OO 

• 

_ 1 


-.52 


Basis for Nul 4: 



(Remember that F r appears in the SVD.) 


17. If U is an orthogonal matrix then det U = =bl. If 

4 = UXU r and 4 is square, then so are U, X, and V. 
Hence det A = det U det X det V T 
= =t 1 det X = ±<Ti • • • a n 


19. Hint: Since U and U are orthogonal, 


A T A = (UZV T ) T UXV T = VZ T U T UXV T 
= V{^ T ^)V~ X 


Thus V diagonalizes A T A. What does this tell you about VI 

21. The right singular vector \\ is an eigenvector for the largest 
eigenvalue X\ of A T A. By Theorem 7 in Section 7.3, the 
largest eigenvalue, A 2 , is the maximum of x T (A T A)x over 
all unit vectors orthogonal to Vi. Since 
x T (A T A)x = 114x |1 2 , the square root of A 2 , which is the 
second largest eigenvalue, is the maximum of | |4x| | over 
all unit vector orthogonal to Vi. 

23. Hint: Use a column-row expansion of (£/X)F r . 


25. Hint: Consider the SVD for the standard matrix of T — say, 
A = U'EV 7 = UXU -1 . Let B = {v 1? and 

C = {ui,..., u m } be bases constructed from the columns of 
V and U , respectively. Compute the matrix for T relative to 
B and C, as in Section 5.4. To do this, you must show that 

v-'yj 


27. [M] 


X 


= e y , the j th column of I n 

-.57 -.65 

-.42 

.27 

.63 -.24 

-.68 

-.29 

.07 -.63 

.53 

-.56 

-.51 .34 

-.29 

-.73 

" 16.46 

0 

0 

0 12.16 

0 

0 

0 

4.87 

0 

0 

0 


0 

0 

0 

4.31 


0 

0 

0 

0 


x 


-.10 

.61 

-.21 

-.52 

.55 

-.39 

.29 

.84 

-.14 

-.19 

-.74 

-.27 

-.07 

.38 

.49 

.41 

-.50 

.45 

-.23 

.58 

-.36 

-.48 

-.19 

-.72 

-.29 


29. [M] 25.9343,16.7554,11.2917,1.0785, .00037793; 

<Ti/a 5 = 68,622 


Section 7.5, page 432 


1. M = 


1 

2 

;B = 

"7 

10 -6 

-9 

-10 

OO 

1- 

O 

1_ 

2 

-4 -1 

5 

3 

Ul 

1_ 


5 


5 = 


86 -27 
-27 16 


3. 


95 

32 


for A = 95.2, 


32 

95 


for A = 6.8 


5. [M] (.130, .874, .468), 75.9% of the variance 

7. yi = .95xi — .32 x 2 ; y 1 explains 93.3% of the variance. 

9. c\ = 1/3, c 2 = 2/3, c 3 = 2/3; the variance of y is 9. 

11. a. If w is the vector in R N with a 1 in each position, then 


[X 


1 


Xyy J W — Xi + • • • + X N — 0 


because the Xk are in mean-deviation form. Then 


[Y 


1 


Y*] 


w 


= [P T X 
= P T [X 


1 


1 


P r X w ]w 
Xjy ] w = P T 0 = 0 


By definition 


That is, Yi + • • • + Y N = 0, so the are in 
mean-deviation form. 

b. Hint: Because the X y are in mean-deviation form, the 
covariance matrix of the X, is 


j 


1/(7V-1)[X 


1 


Xjv][X 


1 


X,v ] 


T 


Compute the covariance matrix of the Y y, using part (a) 


13. If B = [X 


1 


X N ], then 


5 = 


l rr-r l f A A -| 

=V^T [X ' Xn] 


N - 1 


T 


Xi 


X 


T 

N 


= I X ±(X k - M)(X, - M) 

1 1 


T 


Chapter 7 Supplementary Exercises, page 434 


1. a. 

T 

b. 

F 

c. T 

d. 

F 

e. 

F 

f. 

F 

§• 

F 

h. 

T 

i. F 

• 

J- 

F 

k. 

F 

1. 

F 

m. T 

n. 

F 

0 . T 

P- 

T 

q- 

F 




3. If rank A = r , then dim Nul A = n — r , by the Rank 
Theorem. So 0 is an eigenvalue of multiplicity n — r. 
Hence, of the n terms in the spectral decomposition of A, 
exactly n — r are zero. The remaining r terms 
(corresponding to the nonzero eigenvalues) are all rank 1 
matrices, as mentioned in the discussion of the spectral 
decomposition. 

5. If Ay = Av for some nonzero A, then 

v = X~ l Ay = 4(A -1 v), which shows that v is a linear 
combination of the columns of A . 







































Section 8.2 A51 


7. Hint: If A = R T R , where R is invertible, then A is positive 
definite, by Exercise 25 in Section 7.2. Conversely, suppose 
that A is positive definite. Then by Exercise 26 in Section 
7.2, A = B T B for some positive definite matrix B . Explain 
why B admits a QR factorization, and use it to create the 
Cholesky factorization of A. 

9. If A is m x n and x is in R n , then x T A T Ax = ( Ax) T (Ax ) = 

|| Ax || 2 > 0. Thus A T A is positive semidefinite. By 
Exercise 22 in Section 6.5, rank A T A = rank A. 


11. Hint: Write an SVD of A in the form A = U'EV T = PQ , 
where P = UYiU T and Q = UV T . Show that P is 
symmetric and has the same eigenvalues as X. Explain why 
Q is an orthogonal matrix. 


13. a. If b = Ax, then x + = A + b = A+Ax. By 

Exercise 12(a), x + is the orthogonal projection of x 
onto Row A. 


b. From (a) and then Exercise 12(c), 

Ax+ = A(A + Ax) = (AA+A)x = Ax = b. 

c. Since x + is the orthogonal projection onto Row A, the 
Pythagorean Theorem shows that 

||u|| 2 = ||x + 1| 2 + ||u — x + 1| 2 . Part (c) follows 
immediately. 


-2 -14 
-2 -14 
-2 6 
2 -6 
4 -12 

The reduced echelon form of 


13 

13 


.7 

13 

13 


.7 

-7 

-7 

yv 

, x = 

-.8 

7 

7 


.8 

-6 

-6 


.6 

A " 

is the same as the 


15. [M] A+ = 


1 


40 


reduced echelon form of A , except for an extra row of 
zeros. So adding scalar multiples of the rows of A to x T can 
produce the zero vector, which shows that x T is in Row A. 


Basis for Nul A: 


"-1 " 


1 

0 

1_ 

1 


0 

0 

5 

1 

0 


1 

1 

0 

1_ 


1 

0 

_1 


7. a. pj G Span S, but £ aff*S 

b. p 2 G Span S , and p 2 G aff S 

c. p 3 £ Span S, so p 3 £ aff S 


9. Vi 




. Other answers are possible. 


11. See the Study Guide. 

13. Span {v 2 — Vi, v 3 — Vi} is a plane if and only if 

{v 2 — Vi, v 3 — Vi} is linearly independent. Suppose c 2 and 
c 3 satisfy c 2 (v 2 — Vi) + c 3 (v 3 — Vi) = 0. Show that this 
implies c 2 = c 3 = 0. 


15. Let S = {x : Ax = b}. To show that S is affine, it suffices 
to show that S is a flat, by Theorem 3. Let 
W = {x : Ax = 0}. Then IT is a subspace of R n , by 
Theorem 2 in Section 4.2 (or Theorem 12 in Section 2.8). 
Since S = W + p, where p satisfies Ap = b, by Theorem 
6 in Section 1.5, S is a translate of W, and hence S is a flat. 


17. A suitable set consists of any three vectors that are not 
collinear and have 5 as their third entry. If 5 is their third 
entry, they lie in the plane z = 5. If the vectors are not 
collinear, their affine hull cannot be a line, so it must be the 
plane. 

19. If p, q G /( S ), then there exist r, s G S such that /( r) = p 
and /( s) = q. Given any t G R, we must show that 
z = (1 — t) p + t q is in /(S). Now use definitions of p and 
q, and the fact that / is linear. The complete proof is 
presented in the Study Guide. 

21. Since B is affine, Theorem 2 implies that B contains all 
affine combinations of points of B. Hence B contains all 
affine combinations of points of A. That is, aff A C B . 

23. Since A C (A U B), it follows from Exercise 22 that 
aff A C aff (A U B). Similarly, aff B C aff (A U B), so 
[aff A U aff B] C aff (A U B). 

25. To show that D C E H F, show that D C E and D C F. 
The complete proof is presented in the Study Guide. 


Section 8.2, page 454 


Chapter 8 

Section 8.1, page 444 

1. Some possible answers: y = 2\\ — 1.5v 2 + .5v 3 , 
y = 2vi — 2v 3 + y 4 , y = 2vi + 3v 2 — 7v 3 + 3v 4 

3. y = — 3vi + 2 v 2 + 2v 3 . The weights sum to 1, so this is an 
affine sum. 

5. a. pj = 3bi — b 2 — b 3 g aff S since the coefficients sum 
to 1. 

b. p 2 = 2b 1 + 0b 2 + b 3 ^ aff S since the coefficients do 
not sum to 1. 

c. p 3 = —bi + 2b 2 + 0b 3 G aff S since the coefficients 
sum to 1. 


1. Affinely dependent and 2vi + v 2 — 3 v 3 = 0 

3. The set is affinely independent. If the points are called Vi, 
v 2 , v 3 , and v 4 , then {vi, v 2 , v 3 } is a basis for IR 3 and 
v 4 = 16vi + 5v 2 — 3v 3 , but the weights in the linear 
combination do not sum to 1. 

5. —4v! + 5v 2 — 4v 3 + 3v 4 = 0 

7. The barycentric coordinates are (—2,4,-1). 

9. See the Study Guide. 

11. When a set of five points is translated by subtracting, say, 
the first point, the new set of four points must be linearly 
dependent, by Theorem 8 in Section 1.7, because the four 
points are in M. 3 . By Theorem 5, the original set of five 
points is affinely dependent. 

















A52 Answers to Odd-Numbered Exercises 


13. If {vi, v 2 } is affinely dependent, then there exist c\ and c 2 , 
not both zero, such that c x + c 2 = 0 and c\\\ + c 2 v 2 = 0. 
Show that this implies Vi = v 2 . For the converse, suppose 
Vi = v 2 and select specific c\ and c 2 that show their affine 
dependence. The details are in the Study Guide. 


15. a. The vectors v 2 — Vi = 


b 


1 

2 


and v 3 — vi = 


3 

-2 


are 


not multiples and hence are linearly independent. By 
Theorem 5, S is affinely independent. 

Pi ^ H’!’i)>P2^ (Md)-Ps 

P4 ^ D’Ps 




(- 1 -) 
V4 ’ 8 ’ 8/ 


c. p 6 is (-, -, +), p 7 is (0, +, -), and p 8 is (+, +, -). 


17. Suppose S — (bi,..., bk } is an affinely independent set. 
Then equation (7) has a solution, because p is in aff S. 
Hence equation ( 8 ) has a solution. By Theorem 5, the 
homogeneous forms of the points in S are linearly 
independent. Thus ( 8 ) has a unique solution. Then (7) also 
has a unique solution, because ( 8 ) encodes both equations 
that appear in (7). 


The following argument mimics the proof of Theorem 7 in 
Section 4.4. If S' — {bi,..., b k } is an affinely independent 
set, then scalars c \,..., c k exist that satisfy (7), by 
definition of aff S . Suppose x also has the representation 

x = d\b\ +-b d k b k and d\ +-b d k = 1 (7a) 

for scalars d \,..., d k - Then subtraction produces the 
equation 

0 = x - x = (ci - d\)b\ 4-b (c k - dk)b k (7b) 


The weights in (7b) sum to 0 because the c ’s and the d ’s 
separately sum to 1. This is impossible, unless each weight 
in ( 8 ) is 0, because S is an affinely independent set. This 
proves that Cj = d t for i = l,... ,k. 


19. If{p!,p 2 ,p 3 } is an affinely dependent set, then there exist 
scalars c \, c 2 , and c 3 , not all zero, such that 
CiPi + c 2 p 2 + c 3 p 3 = 0 and C\ + c 2 + c 3 = 0. Now use 
the linearity of /. 


21. Let a = 


a i 

a 2 



, and c = 


det [ a b 


c ] = det 


a i 
a 2 

1 


b i ci 

b 2 c 2 

1 1 




CL\ 

a 2 

1 

det 

b\ 

b 2 

1 



c 2 

1 


, by the transpose property of the 


determinant (Theorem 5 in Section 3.2). By Exercise 30 in 
Section 3.3, this determinant equals 2 times the area of the 
triangle with vertices at a, b, and c. 


23. If [a b c] 



= p, then Cramer’s rule gives 


r = det [p b c]/det [a b c]. By Exercise 21, the 
numerator of this quotient is twice the area of Apbc, and 


the denominator is twice the area of Aabc. This proves the 
formula for r. The other formulas are proved using 
Cramer’s rule for s and t. 

25. The intersection point is x(4) = 



1 “ 


7" 


3" 


5.6" 

1 

3 

+ .6 

3 

+ .5 

9 

— 

6.0 


-6 


-5 


-2 


-3.4 


It is not inside the triangle. 

Section 8.3, page 461 

1. See the Study Guide. 

3. None are in conv S'. 

5. p, = -gVi + \y 2 + |v 3 + |v 4 , so p, £ conv S. 

P 2 = ! V 1 + | v 2 + 4 3 + I v 4 , SO p 2 e conv S. 

7. a. The barycentric coordinates of p 1 , p 2 , p 3 , and p 4 are, 

respectively, U, i U, ( 0 , U, ( 5 ,- 7 , 7 ),and 
(I 2 _i\ 

V 2 ’ 4’ 4/• 

b. p 3 and p 4 are outside conv T. p 1 is inside conv T. 
p 2 is on the edge v 2 v 3 of conv T. 

9. p 1 and p 3 are outside the tetrahedron conv S. p 2 is on the 
face containing the vertices v 2 , v 3 , and v 4 . p 4 is inside 
conv S. p 5 is on the edge between \\ and v 3 . 

11. See the Study Guide. 

13. If p, q g /(S), then there exist r, s e S such that /(r) = p 
and /(s) = q. The goal is to show that the line segment 
y = (1 — 0 p + t q, for 0 < t < 1 , is in /(S) . Use the 
linearity of / and the convexity of S to show that 
y — /( w ) f° r some w in S. This will show that y is in f(S) 
and that f(S) is convex. 

15. p = |vi + ^v 2 + }v 4 and p = \y x + \\ 2 + |v 3 . 

17. Suppose A C B, where B is convex. Then, since B is 
convex, Theorem 7 implies that B contains all convex 
combinations of points of B. Hence B contains all convex 
combinations of points of A. That is, conv A C B. 

19. a. Use Exercise 18 to show that conv A and conv B are 

both subsets of conv (A U B). This will imply that their 
union is also a subset of conv {A U B). 

b. One possibility is to let A be two adjacent corners of a 
square and let B be the other two corners. Then what is 
(conv A) U (conv B ), and what is conv (A U B)1 



Po 


23. g(0 = (1 - O f o(0 + tii(0 

= (1 - 0[(1 - ? )Po + fPi] + 1[(1 - OPi + fp 2 ] 
= (1 - 0 2 Po + 21(1 - 1)Pj + t 2 p 2 . 
































Section 8.5 A53 


The sum of the weights in the linear combination for g is 

(1 — t ) 2 + 2t(l — t) + t 2 , which equals 

(1—2 t + t 2 ) + (2 1 — 2 1 2 ) + t 2 = 1. The weights are each 

between 0 and 1 when 0 < t < 1, so g(0 is in 

conv {p 0 , P|, p 2 [. 


29. Let x, y e B( p, S) and suppose z = (1 — t)x + t y, where 
0 < t < 1. Then show that 


Z — p|| = || [(1 — f)x + ry] — p|| 

= ll(i-0(x-p) + *(y-p)ll <s 


Section 8.4, page 469 
1. f (xi, x 2 ) = 3xi + 4 x 2 and d = \3 


3. a. Open 
d. Closed 


b. Closed 
e. Closed 


c. Neither 


5. a. Not compact, convex 

b. Compact, convex 

c. Not compact, convex 

d. Not compact, not convex 

e. Not compact, convex 


7. a. n = 


0 

2 

3 


or a multiple 


b. /(x) = 2x 2 + 3x 3 ,d = 11 


9. a. n = 


3 

1 

2 

1 


or a multiple 


b. /(x) = 3x\ — x 2 + 2x 3 + x 4 , = 5 

11. v 2 is on the same side as 0, Vi is on the other side, and v 3 is 
in H. 


13. One possibility is p = 


32“ 


“ 10“ 

-14 


-7 

0 

, Vi = 

1 

0 


0 


v 2 = 


-4 

1 

0 

1 


15. f(x\,x 2 , x 3 , x 4 ) = x\ — 3 x 2 + 4x 3 — 2x 4 , and d — 5 
17. f(x\, x 2 , x 3 ) = x\ — 2x 2 + x 3 , and d = 0 
19. /(xi, x 2 , x 3 ) = —5x\ + 3x 2 + x 3 , and d = 0 
21. See the Study Guide. 

23. f(x i , x 2 ) = 3xi — 2x 2 with d satisfying 9 < d < 10 is one 
possibility. 

25. f(x,y ) = 4x + y. A natural choice for d is 12.75, which 
equals /(3, .75). The point (3, .75) is three-fourths of the 
distance between the center of 5(0, 3) and the center of 
5(p,l). 

27. Exercise 2(a) in Section 8.3 gives one possibility. Or let 
S = {(x, y) : x 2 y 2 = 1 and y > 0}. Then conv S is the 
upper (open) half-plane. 


Section 8.5, page 481 

1. a. m = 1 at the point p x b. m = 5 at the point p 2 
c. m — 5 at the point p 3 

3. a. m = —3 at the point p 3 

b. m — 1 on the set conv {p 1? p 3 } 


5. 


7. 


11 . 


c. m — — 3 on the set conv {p 1? p 2 } 


-1 

O 

_i 


5 


1 

1_ 


1 

o 

1_ 

1- 

o 

1_ 

9 

1 

o 

_1 


1 

OJ 

1_ 


1 

Ul 

1_ 

—1 

o 
_1 


1 

1 


1 

so 

1_ 


1 

o 

1_ 

1- 

o 

1 _ 

9 

1 

o 

_1 

5 

1 

_1 

5 

1 

so 

_1 


9. The origin is an extreme point, but it is not a vertex. Explain 
why. 



One possibility is to let S be a square that includes part of 
the boundary but not all of it. For example, include just two 
adjacent edges. The convex hull of the profile P is a 
triangular region. 




13. a. /o(C 5 ) = 32, /,(C 5 ) = 80, / 2 (C 5 ) 

/ 3 (C 5 ) = 40,/ 4 (C 5 ) = 10, and 
32-80 + 80-40 + 10 = 2. 


= 80, 



15. a. fo(P") = MQ) + 1 

b. f k (P") = MQ) + f k -i(Q) 

c. /„_!(/>») = /„_ 2 (g) + 1 












































A54 Answers to Odd-Numbered Exercises 


17. See the Study Guide. 

19. Let S be convex and let x e cS + dS , where c > 0 and 
d > 0. Then there exist Si and s 2 in S such that 
x = csi + ds 2 . But then 

( c d \ 

X = C Si + d S 2 = (c + d)\ --Si H-rS 2 . 

\c + d c + d ) 

Now show that the expression on the right side is a member 
of (c + d)S. 

For the converse, pick a typical point in (c + d)S and 
show it is in cS + dS. 

21. Hint: Suppose A and 5 are convex. Let x,y e A + B . 

Then there exist a, c e A and b, d e B such that x = a + b 
and y = c + d. For any t such that 0 < t < 1, show that 

w = (1 — t)x + ty = (1 — t)(a + b) + t( c + d) 

represents a point in A + B . 

Section 8.6, page 492 

1. The control points for x(t) + b should be p 0 + b, p 1 + b, 
and p 3 + b. Write the Bezier curve through these points, 
and show algebraically that this curve is x(t) + b. See the 
Study Guide. 

3. a. x'(t) = (—3 + 6 1 — 3? 2 )p 0 + (3 — 12 1 + 9/ t2 )p 1 + 

(6 1 — 9? 2 )p 2 + 3^ 2 p 3 , so 

x'(0) = —3p 0 + 3p! = 3(pj - p 0 ), and 

x'(l) = —3p 2 + 3p 3 = 3(p 3 — p 2 ). This shows that the 

tangent vector x^O) points in the direction from p 0 to p 1 

and is three times the length of p x — p 0 . Likewise, x'(l) 

points in the direction from p 2 to p 3 and is three times 

the length of p 3 — p 2 . In particular, x'(l) = 0 if and 

only if p 3 = p 2 

b. x"(t) = (6 — 6?)Po + (—12 + 18?)Pi 

+ (6 — 180p 2 + 6£p 3 , so that 

x"(0) = 6p 0 - 12p, + 6p 2 = 6(p 0 - p,) + 6(p 2 - p,) 
and 

x"(l) = 6p, - 12p 2 + 6p 3 = 6 (Pj - p 2 ) + 6(p 3 - p 2 ) 
For a picture of x" (0), construct a coordinate system 
with the origin at pj, temporarily, label p 0 as p 0 — p 2 , 
and label p 2 as p 2 — p 1 . Finally, construct a line from 
this new origin through the sum of p 0 — and p 2 — Pi, 
extended out a bit. That line points in the direction of 
x"(0). 


0 = Pi P 2 -P 1 



5. a. From Exercise 3(a) or equation (9) in the text, 
x'(l) = 3(p 3 - p 2 ) 


Use the formula for x^O), with the control points from 
y (t), and obtain 

y'(0) = —3p 3 + 3p 4 = 3(p 4 - p 3 ) 


For C 1 continuity, 3(p 3 — p 2 ) = 3(p 4 — p 3 ), so 
P 3 = (p 4 + P 2 )/2, and p 3 is the midpoint of the line 
segment from p 2 to p 4 . 

b. If x r (l) = y'(0) = 0, then p 2 = p 3 and p 3 = p 4 . Thus, 
the “line segment” from p 2 to p 4 is just the point p 3 . 
[Note: In this case, the combined curve is still C 1 
continuous, by definition. However, some choices of the 
other “control” points, p 0 , Pi , p 5 , and p 6 , can produce a 
curve with a visible corner at p 3 , in which case the 
curve is not G 1 continuous at p 3 .] 

7. Hint: Use x"{t) from Exercise 3 and adapt this for the 
second curve to see that 


y"(0 = 6(1 - f)p 3 + 6(—2 + 3i)p 4 + 6(1 - 3i)p 5 + 6ip 6 

Then set x"(l) = y"(0). Since the curve is C 1 continuous 
at p 3 , Exercise 5(a) says that the point p 3 is the midpoint of 
the segment from p 2 to p 4 . This implies that 
p 4 — p 3 = p 3 — p 2 . Use this substitution to show that p 4 
and p 5 are uniquely determined by p 1 , p 2 , and p 3 . Only p 6 
can be chosen arbitrarily. 


9. Write a vector of the polynomial weights for x(t ), expand 
the polynomial weights, and factor the vector as M B u(t ): 


1 - \t + 6t 2 - 41 3 + t A 

At - \2t 2 + \2t 2 -4t A 
6 1 2 - 12U + 6t 4 
4t 3 - 41 4 



1 

-4 

6 

-4 

1 

0 

4 

-12 

12 

-4 

0 

0 

6 

-12 

6 

0 

0 

0 

4 

-4 

0 

0 

0 

0 

1 

1 

-4 

6 

-4 

1 

0 

4 

-12 

12 

-4 

0 

0 

6 

-12 

6 

0 

0 

0 

4 

-4 

0 

0 

0 

0 

1 
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11. See the Study Guide. 

13. a. Hint: Use the fact that q 0 = p 0 . 

b. Multiply the first and last parts of equation (13) by | 
and solve for 8 q 2 . 


c. Use equation (8) to substitute for 8 q 3 and then apply 
part (a). 


15. a. From equation (11), y r (l) = .5x'(.5) = z^O). 

b. Observe that y r (l) = 3(q 3 — q 2 ). This follows from 
equation (9), with y(t) and its control points in place of 
x(t) and its control points. Similarly, for z (t) and its 
control points, z r (0) = 3(iq — r 0 ). By part (a), 













3(q 3 — q 2 ) = 3(ri — r 0 ). Replace r 0 by q 3 , and obtain 
q 3 — q 2 = r i — q 3 , and hence q 3 = (q 2 + i*i)/2. 

c. Set q 0 = p 0 and r 3 = p 3 . Compute q x = (p 0 + p x )/2 
and r 2 = (p 2 + p 3 )/2. Compute m = (p x + P 2 )/ 2 - 
Compute q 2 = (qj + m)/2 and = (m + r 2 )/2. 
Compute q 3 = (q 2 + ri)/2 and set r 0 = q 3 . 




r 0 = p 0 ,ri 


Po + 2P! 
3 



2Pi + p 2 

3 




Hint: Write the standard formula (7) in this section, with 
r, in place of p- for / = 0,..., 3, and then replace r 0 
and r 3 by p 0 and p 2 , respectively: 


x(t) = (1 — 3 1 + 3 1 2 — t 3 ) Po 

+ (3^ — 6 1 2 + 3? 3 )ri (Hi) 

+ (3 1 2 - 3t 3 )r 2 + t 3 p 2 


Use the formulas for ri and r 2 from part (a) to examine 
the second and third terms in this expression for x(7). 
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Absolute value, complex number, A3 
Accelerator-multiplier model, 253n 
Adjoint, classical, 181 
Adjugate matrix, 181 
Adobe Illustrator, 483 
Affine combinations, 438-446 
definition, 438 
of points, 438-440,443-444 
Affine coordinates. See B ary centric 

coordinates 

Affine dependence, 446-456 
definition, 446 

linear dependence and, 447-448, 454 
Affine hull (affine span), 439, 456 
geometric view of, 443 
of two points, 448 
Affine independence, 446-456 
barycentric coordinates, 449-455 
definition, 446 
Affine set, 441-443, 457 
dimension of, 442 
intersection of, 458 
Affinely dependent, 446 
Aircraft design, 93-94 
Algebraic multiplicity, eigenvalue, 278 
Algorithms 

change-of-coordinates matrix, 242 
compute a 23-matrix, 295 
decouple a system, 317 
diagonalization, 285-287 
Gram-Schmidt process, 356-362 
inverse power method, 324-326 
Jacobi’s method, 281 
LU factorization, 127-129 
QR algorithm, 326 
reduction to first-order system, 252 
row-column rule for computing 
AB, 96 

row reduction, 15-17 
row-vector rule for computing Ax, 38 
singular value decomposition, 419-422 
solving a linear system, 21 
steady-state vector, 259-262 
writing solution set in parametric vector 
form, 47 
Ampere, 83 

Analysis of variance, 364 
Angles in R 2 and R 3 ,337-338 
Area 

approximating, 185 
determinants as, 182-184 
ellipse, 186 
parallelogram, 183 


Argument, of a complex number, A5 
Associative law, matrix multiplication, 99 
Associative property, matrix addition, 96 
Astronomy, barycentric coordinates in, 450n 
Attractor, dynamical system, 306, 315-316 
Augmented matrix, 4, 6-8, 18,21, 

38,440 

Auxiliary equation, 250-251 
Average value, 383 
Axioms 

inner product space, 378 
vector space, 192 

23-coordinate vector, 218-220 

23-matrix, 291-292, 294-295 

B-splines, 486 

Back-substitution, 19-20 

Backward phase, row reduction algorithm, 17 

Barycentric coordinates, 448-453 

Basic variable, pivot column, 18 

Basis 

change of basis 

overview, 241-243 
R n ,243-244 
column space, 213-214 
coordinate systems, 218-219 
eigenspace, 270 

fundamental set of solutions, 314 
fundamental subspaces, 422-423 
null space, 213-214, 233-234 
orthogonal, 340-341 
orthonormal, 344, 358-360, 399, 418 
row space, 233-235 
spanning set, 212 
standard basis, 150, 211,219, 344 
subspace, 150-152, 158 
two views, 214-215 
Basis matrix, 487n 
Basis Theorem, 229-230, 423, 467 
Beam model, 106 
Bessel’s inequality, 392 
Best Approximation Theorem, 352-353 
Best approximation 
Fourier, 389 
P 4 ,380-381 

to y by elements of W, 352 
Bezier bicubic surface, 489,491 
Bezier curves 

approximations to, 489-490 
connecting two curves, 485-487 
matrix equations, 487-488 
overview, 483-484 
recursive subdivisions, 490-491 


Bezier surfaces 

approximations to, 489-490 
overview, 488-489 
recursive subdivisions, 490-491 
Bezier, Pierre, 483 
Bidiagonal matrix, 133 
Blending polynomials, 487n 
Block diagonal matrix, 122 
Block matrix. See Partitioned matrix 
Block multiplication, 120 
Block upper triangular matrix, 121 
Boeing, 93-94 
Boundary condition, 254 
Boundary point, 467 
Bounded set, 467 
Branch current, 83 
Branch, network, 53 

C (language), 39, 102 
C, A2 

C\300-302 
C 3 ,310 

C 1 geometric continuity, 485 
CAD. See Computer-aided design 
Cambridge diet, 81 
Capacitor, 314-315,318 
Caratheodory, Constantin, 459 
Caratheodory’s theorem, 459 
Casorati matrix, 247-248 
Casoratian, 247 
Cauchy-Schwarz inequality, 

381-382 

Cayley-Hamilton theorem, 328 
Center of projection, 144 
Ceres, 376n 

CFD. See Computational fluid dynamics 
Change of basis, 241-244 
Change of variable 

dynamical system, 308 
principal component analysis, 429 
quadratic form, 404-405 
Change-of-coordinates matrix, 221,242 
Characteristic equation, 278-279 
Characteristic polynomial, 278, 281 
Characterization of Linearly Dependent Sets 

Theorem, 59 

Chemical equation, balancing, 52 
Cholesky factorization, 408 
Classical adjoint, 181 
Closed set, 467-468 
Closed (subspace), 148 
Codomain, matrix 

transformation, 64 
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Coefficient 

correlation coefficient, 338 
filter coefficient, 248 
Fourier coefficient, 389 
of linear equation, 2 
regression coefficient, 371 
trend coefficient, 388 
Coefficient matrix, 4, 38, 136 
Cofactor expansion, 168-169 
Column 

augmented, 110 
determinants, 174 
operations, 174 
pivot column, 152, 157, 214 
sum, 136 
vector, 24 

Column-row expansion, 121 
Column space 
basis, 213-214 
dimension, 230 
null space contrast, 204-206 
overview, 203-204 
subspaces, 149, 151-152 
Comet, orbit, 376 
Comformable partitions, 120 
Compact set, 467 

Complex eigenvalue, 297-298, 300-301, 

309-310,317-319 
Complex eigenvector, 297 
Complex number, A2-A6 
absolute value, A3 
argument of, A5 
conjugate, A3 

geometric interpretation, A4-A5 
powers of, A6 
M 2 , A6 
system, A2 

Complex vector, 24n, 299-301 
Complex vector space, 192n, 297, 310 
Composite transformation, 141-142 
Computational fluid dynamics (CFD), 93-94 
Computer-aided design (CAD), 140,489 
Computer graphics 

barycentric coordinates, 451-453 
composite transformation, 141-142 
homogeneous coordinates, 141-142 
perspective projection, 144-146 
three-dimensional graphics, 142-146 
two-dimensional graphics, 140-142 
Condition number, 118, 422 
Conformable partition, 120 
Conjugate, 300, A3 
Consistent system of linear equations, 

4,7-8,46-47 

Constrained optimization problem, 410-415 
Consumption matrix, Leontief input-output 

model, 135-136 

Contraction transformation, 67, 75 
Control points, 490-491 
Control system 

control sequence, 266 
controllable pair, 266 


Schur complement, 123 
space shuttle, 189-190 
state vector, 256, 266 
steady-state response, 303 
Controllability matrix, 266 
Convergence, 137, 260 
Convex combinations, 456-463 
Convex hull, 458, 467, 474,490 
Convex set, 458-459 
Coordinate mapping, 218-224 
Coordinate systems 

B-coordinate vector, 218-220 
graphical interpretation of coordinates, 
219-220 

mapping, 221-224 
R /! subspace, 155-157,220-221 
unique representation theorem, 218 
Coordinate vector, 156, 218-219 
Correlation coefficient, 338 
Covariance, 429-430 
Covariance matrix, 428 
Cramer’s rule, 179-180 

engineering application, 180 
inverse formula, 181-182 
Cray supercomputer, 122 
Cross product, 466 
Cross-product formula, 466 
Crystallography, 219-220 
Cubic curves 

Bezier curve, 484 
Hermite cubic curve, 487 
Current, 83-84 
Curve fitting, 23, 373-374, 

380-381 

Curves. See Bezier curves 
D, 194 

De Moivre’s Theorem, A6 
Decomposition 

eigenvector, 304, 321 
force into component forces, 344 
orthogonal, 341-342 
polar, 434 

singular value, 416-426 
See also Factorization 
Decoupled systems, 314, 317 
Deflection vector, 106-107 
Design matrix, 370 
Determinant, 105 
area, 182-184 

cofactor expansion, 168-169 
column operations, 174 
Cramer’s rule, 179-180 
eigenvalues and characteristic equation of a 
square matrix, 276-278 
linear transformation, 184-186 
linearity property, 175-176 
multiplicative property, 175-176 
overview, 166-167 
recursive definition, 167 
row operations, 171-174 
volume, 182-183 


Diagonal entries, 94 

Diagonal matrix, 94, 122, 283-290, 417-419 
Diagonal matrix Representation Theorem, 293 
Diagonalization matrix 

matrices whose eigenvalues are not 
distinct, 287-288 
orthogonal diagonalization, 

420,426 

overview, 283-284 
steps, 285-286 

sufficient conditions, 286-287 
symmetric matrix, 397-399 
theorem, 284 

Diagonalization Theorem, 284 

Diet, linear modeling of weight-loss diet, 

81-83 

Difference equation. See Linear difference 

equation 

Differential equation 

decoupled systems, 314, 317 
eigenfunction, 314-315 
fundamental set of solutions, 314 
kernel and range of linear 
transformation, 207 
Dilation transformation, 67, 73, 75 
Dimension 

column space, 230 
null space, 230 

R 3 subspace classification, 228-229 
subspace, 155,157-158 
vector space, 227-229 
Dimension of a flat, 442 
Dimension of a set, 442 
Discrete linear dynamical system, 

268,303 

Disjoint closed convex set, 468 
Dodecahedron, 437 
Domain, matrix transformation, 64 
Dot product, 38, 332 
Dusky-footed wood rat, 304 
Dynamical system, 64, 267-268 
attractor, 306, 315-316 
decoupling ,317 

discrete linear dynamical system, 

268,303 

eigenvalue and eigenvector applications, 
280-281,305 
evolution, 303 
repeller, 306, 316 
saddle point, 307-309, 316 
spiral point, 319 
trajectory, 305 

Earth Satellite Corporation, 395 
Echelon form, 13-15, 173, 238, 270 
Echelon matrix, 13-14 
Economics, linear system applications, 

50-55 

Edge, face of a polyhedron, 472 
Effective rank, matrix, 419 
Eigenfunction, differential equation, 314-315 
Eigenspace, 270-271,399 
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Eigenvalue, 269 

characteristic equation of a square 
matrix, 276 

characteristic polynomial, 279 
determinants, 276-278 
finding, 27 8 

complex eigenvalue, 297-298, 300-301, 
309-310,317-319 

diagonalization. See Diagonalization, 
matrix 

differential equations. See Differential 
equations 

dynamical system applications, 281 
interactive estimates 

inverse power method, 324-326 
power method, 321-324 
quadratic form, 407-408 
similarity transformation, 279 
triangular matrix, 271 
Eigenvector, 269 

complex eigenvector, 297 
decomposition, 304 
diagonalization. See Diagonalization, 
matrix 

difference equations, 273 
differential equations. See Differential 
equations 

dynamical system applications, 281 
linear independence, 272 
linear transformation 

matrix of linear transformation, 
291-292 
R n ,293-294 

similarity of matrix representations, 
294-295 

from V into V , 292 
row reduction, 270 
Eigenvector basis, 284 
Election, Markov chain modeling of 

outcomes, 257-258, 261 
Electrical engineering 

matrix factorization, 129-130 
minimal realization, 131 
Electrical networks, 2, 83-84 
Elementary matrix, 108 
inversion, 109-110 
types, 108 

Elementary reflector, 392 
Elementary row operation, 6, 

108-109 
Ellipse, 406 
area, 186 

singular values, 417-419 
sphere transformation onto ellipse in R 2 , 
417-418 

Equal vectors, in R 2 ,24 

Equilibrium price, 50, 52 

Equilibrium vector. See Steady-state vector 

Equivalence relation, 295 

Equivalent linear systems, 3 

Euler, Leonard, 481 

Euler’s formula, 481 


Evolution, dynamical system, 303 
Existence 

linear transformation, 73 
matrix equation solutions, 37-38 
matrix transformation, 65 
system of linear equations, 7-9, 

20-21 

Existence and Uniqueness Theorem, 21 
Extreme point, 472, 475 

Faces of a polyhedron, 472 
Facet, 472 
Factorization 

analysis of a dynamical system, 283 
block matrices, 122 
complex eigenvalue, 301 
diagonal, 283, 294 
dynamical system, 283 
electrical engineering, 129-131 
See also LU Factorization 
Feasible set, 414 
Feynman, Richard, 165 
Filter coefficient, 248 
Filter, linear, 248-249 
Final demand vector, Leontief input-output 

model, 134 
Finite set, 228 
Finite-dimensional vector 
space, 228 
subspaces, 229-230 
First principal component, 395 
First-order difference equation. See Linear 

difference equation 

First-order equations, reduction to, 252 
Flexibility matrix, 106 
Flight control system, 191 
Floating point arithmetic, 9 
Flop, 20, 127 

Forward phase, row reduction 

algorithm, 17 

Fourier approximation, 389-390 
Fourier coefficient, 389 
Fourier series, 390 
Free variable, pivot column, 18, 20 
Fundamental set of solutions, 251 
differential equations ,314 
Fundamental subspace, 239, 337,422-423 

Gauss, Carl Friedrich, 12n, 376n 
Gaussian elimination, 12n 
General least-squares problem, 362-366 
General linear model, 373 
General solution, 18, 251-252 
Geometric continuity, 485 
Geometric descriptions 
R 2 , 25-27 
span{u,v}, 30-31 
span{v}, 30-31 
vector space, 193 
Geometric interpretation 
complex numbers, A4-A5 
orthogonal projection, 351 


Geometric point, 25 
Geometry of vector space 

affine combinations, 438-446 
affine independence, 446-456 
barycentric coordinates, 448-453 
convex combinations, 456-463 
curves and surfaces, 483-492 
hyperplanes, 463-471 
polytopes, 471-483 
Geometry vector, 488 
Given rotation, 91 

Global Positioning System (GPS), 331-332 
Gouraud shading, 489 
GPS. See Global Positioning System 
Gradient, 464 
Gram matrix, 434 
Gram-Schmidt process 
inner product, 379-380 
orthonormal bases, 358 
QR factorization, 358-360 
steps, 356-358 

Graphical interpretation, coordinates, 219-220 
Gram-Schmidt Process Theorem, 357 

Halley’s Comet, 376 

Hermite cubic curve, 487 

Hermite polynomials, 231 

High-end computer graphics boards, 146 

Homogeneous coordinates 

three-dimensional graphics, 143-144 
two-dimensional graphics, 141-142 
Homogeneous linear systems 
applications, 50-52 
linear difference equations, 248 
solution, 43-45 
Householder matrix, 392 
Householder reflection, 163 
Howard, Alan H., 81 
Hypercube, 479-481 
Hyperplane, 442,463-471 

Icosahedron, 437 
Identity matrix, 39, 108 
Identity for matrix multiplication, 99 
(i, j) -cofactor, 167-168 
Ill-conditioned equations, 366 
Ill-conditioned matrix, 118 
Imaginary axis, A4 
Imaginary numbers, pure, A4 
Imaginary part 

complex number, A2 
complex vector, 299-300 
Inconsistent system of linear equations, 4, 40 
Indefinite quadratic form, 407 
Indifference curve, 414 
Inequality 
Bessel’s, 392 

Cauchy-Schwarz, 381-382 
triangle, 382 
Infinite set, 227n 
Infinite-dimensional vector, 228 
Initial value problem, 314 
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Inner product 
angles, 337 
axioms, 378 
C[a, b], 382-384 
evaluation, 382 
length, 335, 379 
overview, 332-333, 378 
properties, 333 
R' 1 ,378-379 

Inner product space, 378-380 
best approximation in, 380-381 
Cauchy-Schwarz inequality in, 381-382 
definition, 378 
Fourier series, 389-390 
Gram-Schmidt process, 379-380 
lengths in, 379 
orthogonality in, 390 
trend analysis, 387-388 
triangle inequality in, 382 
weighted least-squares, 385-387 
Input sequence, 266 

Inspection, linearly dependent vectors, 59-60 
Interchange matrix, 175 
Interior point, 467 

Intermediate demand, Leontief input-output 

model, 134-135 

International Celestial Reference 

System, 450n 
Interpolated color, 451 
Interpolated polynomial, 23, 162 
Invariant plane, 302 
Inverse, matrix, 104-105 

algorithm for finding A ~ 1 , 110 

characterization, 113-115 

Cramer’s rule, 181-182 

elementary matrix, 109-110 

flexibility matrix, 106 

invertible matrix, 106-107 

linear transformations, invertible, 115-116 

Moore-Penrose inverse, 424 

partitioned matrix, 121-123 

product of invertible matrices, 108 

row reduction, 110-111 

square matrix, 173 

stiffness matrix, 106 

Inverse power method, interactive estimates 

for eigenvalues, 324-326 
Invertible Matrix Theorem, 114-115, 122, 

150,158-159,173, 176,237, 
276-277,423 

Isomorphic vector space, 222, 224 
Isomorphism, 157, 222, 380n 
Iterative methods 

eigenspace, 322-324 
eigenvalues, 279, 321-327 
inverse power method, 324-326 
Jacobi’s method, 281 
power method, 321-323 
QR algorithm, 281-282, 326 

Jacobian matrix, 306n 
Jacobi’s method, 281 


Jordan, Wilhem, 12n 
Jordan form, 294 
Junction, network, 53 

k-face, 472 
k-polytope, 472 
k-pyramid, 482 
Kernel, 205-207 
Kirchhoff’s laws, 84, 130 

Ladder network, 130 
Laguerre polynomial, 231 
Lamberson, R., 267-268 
Landsat satellite, 395-396 
LAPACK, 102, 122 
Laplace transform, 180 
Leading entry, 12, 14 
Leading variable, 18n 
Least-squares error, 365 
Least-squares solution, 331 
alternative calculations, 

366-367 

applications 

curve fitting, 373-374 
general linear model, 373 
least-squares lines, 370-373 
multiple regression, 374-375 
general solution, 362-366 
QR factorization, 366-367 
singular value decomposition, 424 
weighted least-squares, 385-387 
Left distributive law, matrix multiplication, 99 
Left-multiplication, 100, 108-109, 

178,360 

Left singular vector, 419 
Length, vector, 333-334, 379 
Leontief, Wassily, 1,50, 134, 139n 
Leontief input-output model 
column sum, 136 
consumption matrix, 135-136 
final demand vector, 134 
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change of basis, 243-244 

dimension of a flat, 442 

distance in, 334-335 

eigenvector basis, 284 

inner product, 378-379 

linear functional, 463 

linear transformations on, 293-294 

orthogonal projection, 349-351 

quadratic form. See Quadratic form 

subspace 

basis, 150-152, 158 
column space, 149, 151-152 
coordinate systems, 155-157, 220-221 
dimension, 155, 157-158 
lines, 149 


null space, 150-151 
properties, 148 
rank,157-159 
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Similarity transformation, 279 
Simplex, 477-479 
Singular matrix, 105, 115-116 
Singular value decomposition (SVD), 

416-417,419-420 

applications 

bases for fundamental subspaces, 
422-423 

condition number, 422 
least-squares solution, 424 
reduced decomposition and 
pseudoinverse, 424 
internal structure, 420-422 
R 3 sphere transformation onto ellipse in 

R2, 417 — 418 

singular values of a matrix, 418-419 
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System of linear equations ( Continued ) 
overview, 3-4 

parametric descriptions of solution 
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Laguerre polynomials, 231 
Laplace transforms, 124,180 
Legendre polynomials, 383 
Linear transformations in calculus, 206, 
Simplex, 477-479 
Splines, ! WEB 483-486,492-493 
Triangle inequality, 382 
Trigonometric polynomials, 389 

Numerical Linear Algebra 

Band matrix, 133 
Block diagonal matrix, 122, 124 
Cholesky factorization, 

Companion matrix, 329 
Condition number, 116, 118,178, 393, 422 
Effective rank, 


292-294 


WEB 


408,434 


WEB 


238,419 


Floating point arithmetic, 9, 20,187 
Fundamental subspaces, 239, 337,422-423 
Givens rotation, 


WEB 


91 


WEB 


361 


Gram matrix, 434 
Gram-Schmidt process, 

Hilbert matrix ,118 
Householder reflection, 163, 392 
Ill-conditioned matrix (problem), 116, 366 
Inverse power method, 324-326 
Iterative methods, 321-327 
Jacobi’s method for eigenvalues, 281 
LAPACK, 102, 122 

Large-scale problems, 91,122,1 WEB 1331-332 
LU factorization, 126-129,131-132,133,434 
Operation counts, 20, 


WEB 


111,127, 


WEB 


129,174 


Outer products, 103,121 
Parallel processing, 1 
Partial pivoting, 17, 


WEB 


129 


Polar decomposition, 434 
Power method, 321-324 
Powers of a matrix, 


WEB 


101 


Pseudoinverse, 424,435 
QR algorithm, 282, 326 
QR factorization, 359-360, 


WEB 


361, 


WEB 


369,392-393 


Rank-revealing factorization, 132, 266,434 
Rank theorem, 


WEB 


235,240 


Rayleigh quotient, 326-327, 393 

Relative error, 393 

Schur complement, 124 

Schur factorization, 393 

Singular value decomposition, 132,416-426 

Sparse matrix, 93,137,174 

Spectral decomposition, 400-401 

Spectral factorization, 132 

Tridiagonal matrix, 133 

Vandermonde matrix, 162,188, 329 


Vector pipeline architecture, 122 


Physical Sciences 

Cantilevered beam, 254 
Center of gravity, 34 
Chemical reactions, 52,55 
Crystal lattice, 220, 226 
Decomposing a force, 344 
Digitally recorded sound, 247 
Gaussian elimination, 12 
Hooke’s law, 106 
Interpolating polynomial, 
Kepler’s first law, 376 
Landsat image, 


WEB 


23,162 


WEB 


395-396 


Linear models in geology and geography, 374-375 

Mass estimates for radioactive substances, 376 

Mass-spring system, 198, 216 

Model for glacial cirques, 374 

Model for soil pH, 374 

Pauli spin matrices, 162 

Periodic motion, 297 

Quadratic forms in physics, 403-410 

Radar data, 124 

Seismic data, 2 

Space probe, 124 

Steady-state heat flow, 11,133 

Superposition principle, 67, 84, 314 

Three-moment equation, 254 

Traffic flow, 53-54, 56 

Trend surface, 374 

Weather, 263 

Wind tunnel experiment, 23 

Statistics 

Analysis of variance, 364 
Covariance, 427-429,430,431,432 
Full rank, 239 
Least-squares error, 365 
Least-squares line, 


WEB 


331, 


WEB 


369,370-372 


Linear model in statistics, 370-377 
Markov chains,! WEB 255-264, 281 


WEB 


Mean-deviation form for data, 372,428 
Moore-Penrose inverse, 424 
Multichannel image processing, 
Multiple regression, 374-375 
Orthogonal polynomials, 381 
Orthogonal regression, 433-434 
Powers of a matrix, 


395-396,426-434 


WEB 


WEB 


395-396,429-430 


WEB 


101 

Principal component analysis, 

Quadratic forms in statistics, 403 
Readjusting the North American Datum, 
Regression coefficients, 371 
Sums of squares (in regression), 377, 385-386 
Trend analysis, 387-388 
Variance, 377,428-429 
Weighted least-squares, 378, 385-387 


331-332 


















































